Skip to content
Snippets Groups Projects
i18n.txt 2.87 KiB
Newer Older
  • Learn to ignore specific revisions
  • Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    Git is to some extent character encoding agnostic.
    
     - The contents of the blob objects are uninterpreted sequences
       of bytes.  There is no encoding translation at the core
       level.
    
     - Path names are encoded in UTF-8 normalization form C. This
       applies to tree objects, the index file, ref names, as well as
       path names in command line arguments, environment variables
       and config files (`.git/config` (see linkgit:git-config[1]),
       linkgit:gitignore[5], linkgit:gitattributes[5] and
       linkgit:gitmodules[5]).
    +
    Note that Git at the core level treats path names simply as
    sequences of non-NUL bytes, there are no path name encoding
    conversions (except on Mac and Windows). Therefore, using
    non-ASCII path names will mostly work even on platforms and file
    systems that use legacy extended ASCII encodings. However,
    repositories created on such systems will not work properly on
    UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
    Additionally, many Git-based tools simply assume path names to
    be UTF-8 and will fail to display other encodings correctly.
    
     - Commit log messages are typically encoded in UTF-8, but other
       extended ASCII encodings are also supported. This includes
       ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
       EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
       EUC-x, CP9xx etc.).
    
    Although we encourage that the commit log messages are encoded
    in UTF-8, both the core and Git Porcelain are designed not to
    force UTF-8 on projects.  If all participants of a particular
    project find it more convenient to use legacy encodings, Git
    does not forbid it.  However, there are a few things to keep in
    mind.
    
    . 'git commit' and 'git commit-tree' issues
      a warning if the commit log message given to it does not look
      like a valid UTF-8 string, unless you explicitly say your
      project uses a legacy encoding.  The way to say this is to
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
      have `i18n.commitEncoding` in `.git/config` file, like this:
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    +
    ------------
    [i18n]
    
    	commitEncoding = ISO-8859-1
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    ------------
    +
    Commit objects created with the above setting record the value
    
    of `i18n.commitEncoding` in its `encoding` header.  This is to
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    help other people who look at them later.  Lack of this header
    implies that the commit log message is encoded in UTF-8.
    
    . 'git log', 'git show', 'git blame' and friends look at the
      `encoding` header of a commit object, and try to re-code the
      log message into UTF-8 unless otherwise specified.  You can
      specify the desired output encoding with
    
      `i18n.logOutputEncoding` in `.git/config` file, like this:
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    +
    ------------
    [i18n]
    
    	logOutputEncoding = ISO-8859-1
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    ------------
    +
    If you do not have this configuration variable, the value of
    
    `i18n.commitEncoding` is used instead.
    
    Jean-Noël Avila's avatar
    Jean-Noël Avila committed
    
    Note that we deliberately chose not to re-code the commit log
    message when a commit is made to force UTF-8 at the commit
    object level, because re-coding to UTF-8 is not necessarily a
    reversible operation.