Tuesday, September 22, 2009

Inside Git

This is an exploration into what is going on when I run some basic git commands. We start out by creating a new repository. git/object is the directory where git stores all its objects, and it is empty initially.

$ mkdir myrepo
$ cd myrepo/
$ git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/myrepo/.git/
$ find .git/objects -type f     # find all files in .git/objects
$ 

When a file is added to git it gets stored in the .git/objects directory under the name of its hash. The first two characters of the hash is used as the name of a subdirectory and the rest become the file name. Worth noting is that the hash uniquely identifies its content, so were you to run the commands on your computer, your results should be identical.

$ echo "A tapir has 14 toes" > tapir.txt
$ git add tapir.txt
$ find .git/objects -type f
.git/objects/12/a93608760777f50380a94b52e1b54ec69f4743
$ git hash-object tapir.txt
12a93608760777f50380a94b52e1b54ec69f4743

If you try to list the contents of the file, you are out of luck since it is stored in a binary format, you should instead use the git command git cat-file. The file above is a blob and its contents is what can be expected.

$ cat .git/objects/12/a93608760777f50380a94b52e1b54ec69f4743
xK??OR02`pT(I,?,R?H,V04Q(?O-?zi$ 
$
$ git cat-file -t 12a93608760777f50380a94b52e1b54ec69f4743
blob
$ git cat-file blob 12a936   # Using the first part of the hash is enough
A tapir has 14 toes
 

Even though the file is in the .git/objects directory it is not committed yet and it cannot be read by the high-level git commands such as git log. git status on the other hand will show that the file is staged, or in the index.

$ git log
fatal: bad default revision 'HEAD'
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
# new file:   tapir.txt
#

When I commit the file, two more objects are added to the .git/objects directory

$ git commit -m "added tapir file"
[master (root-commit) 7e38e95] added tapir file
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 tapir.txt
$ find .git/objects/ -type f
.git/objects//12/a93608760777f50380a94b52e1b54ec69f4743
.git/objects//7e/38e95d328287ea9d234a2affc4ed9e4510435a
.git/objects//e8/493a7e63154350f8c3d08a42e759132d9d2a39

One is tree object and the other is a commit.

$ git cat-file -t 7e38
commit
$ git cat-file -t e849
tree
$

The commit contains the information that was recorded when I committed. Apart from the commit message and my personal info it contains a reference to the tree object that was created simultaneously with the commit.

$ git cat-file commit  7e38
tree e8493a7e63154350f8c3d08a42e759132d9d2a39
author Anders Janmyr <anders.janmyr@jayway.se> 1253590540 +0200
committer Anders Janmyr <anders.janmyr@jayway.se> 1253590540 +0200

added tapir file
$ 

The tree object is stored in binary format and cannot be completely read without the help of git ls-tree. Now I can see that it contains a reference to the blob that was created initially, the tapir.txt file.

$ git cat-file tree e8493a7e63154350f8c3d08a42e759132d9d2a39
100644 tapir.txt?vw???KR?NƟGC$ 
$ git ls-tree e8493a7e63154350f8c3d08a42e759132d9d2a39
100644 blob 12a93608760777f50380a94b52e1b54ec69f4743 tapir.txt
$

So how does git know what is the latests commit? In git lingo the latest commit is know as the HEAD. If I look inside .git/HEAD I see a reference and this reference points to the latest commit.

$  cat ./.git/HEAD
ref: refs/heads/master
$ cat ./.git/refs/heads/master
7e38e95d328287ea9d234a2affc4ed9e4510435a

The .git/refs directory is where all the references of git live, heads and tags.

$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/heads/master
.git/refs/tags
$ git branch olle
$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/heads/master
.git/refs/heads/olle
.git/refs/tags

Creating a new branch with git branch shows that the branch is added to the heads directory, switching to it will change the .git/HEAD contents.

$  cat ./.git/HEAD
ref: refs/heads/master
$ git co olle
Switched to branch 'olle'
$  cat ./.git/HEAD
ref: refs/heads/olle

Git, simple, but beautiful!

5 comments:

Anonymous said...

Thanks a lot for this article ! you helped me to recover my corrupted GIT repository !

note : I tipped 1$ on your article with the service tiptheweb.org

Anders Janmyr said...

I'm glad it helped you and thanks for the tip!

Anonymous said...

excellent article, much appreciated. Bookmarked and I'll be back...

Anders Janmyr said...

@anonymous, You're welcome, glad you liked it.

Glad anonymous user said...

A worthy read indeed, will come back for more interesting Git articles I see you have plenty of. :-)