The Tapir's Tale: git

Showing posts with label git. Show all posts

Thursday, September 02, 2010

Using Git with Subversion

I had the unfortunate experience of having to use Subversion again after using Git for a long time. It is amazing how fast I can forget. After renaming a directory at the prompt, and the agony that goes with it, I decided to switch back to Git.

$ mv requester sampler
# svn agony after renaming a directory
$ svn st
?       sampler
!       requester
!       requester.rb
A       sampler.rb

The tool to use when using Git with Subversion is, of course, git svn.

git svn works very well as long as you remember that Subversion is not Git. It does not handle merging well, and it will bite you if you don't respect that. So what does this actually mean? It means:

Always keep the Git master in sync with Subversion

To do this you have two commands you can use.

# Rebases the commits from the upstream Subversion server with your local master.
$ git svn rebase

You should only git svn rebase in your Git master, and you should ALWAYS do it before you git svn dcommit anything to the subversion repository. git svn rebase keeps the upstream subversion in sync with your local master by pulling down the changes, kind of like svn update.

# Commits the changes you have in your local master to the upstream Subversion server.
$ git svn dcommit

When you have changes ready to commit, you commit them to subversion with git svn dcommit. You should ALWAYS git svn rebase before you do the update, or it will fail.

That's it! As long as you follow these two simple rules, your life with git svn will be easy. If you forget to follow them, you will be bitten. When you get bitten, the cool thing about Git is that even if you screw up, it is always possible to sort it out.

It that was all there was to it, there would be no reason to use Git instead of Subversion. Git really shines when it comes to branching and merging. You may create as many local branches as you like with git branch branch_name or git checkout -b branch_name. You can hack around in these local branches as much as you want and merge them together. But, before you merge them into the master branch, you must rebase with master! Not merge, rebase! Rebase means replay the commits on top of the named branch. It creates new commits, the same content, but with a different SHA-1.

# Example session

(master)$ git svn rebase
(master)$ git checkout -b dev
hack, hack, hack, ...
(dev)$ git commit -am 'Commit the changes' 

(dev)$ git checkout master
(master)$ git checkout -b bugfix
hack, hack, hack, ..., done

(bugfix)$ git checkout master
(master)$ git svn rebase
(master)$ git checkout bugfix
$ git rebase master
(bugfix)$ git checkout master
(master)$ git merge --ff bugfix # --ff only fast-forwards, merges that don't need to merge. 
(master)$ git svn dcommit
(master)$ git branch -D bugfix # delete the branch it is not needed anymore

(master)$ git checkout dev
hack, hack, hack, ..., done

(dev)$ git checkout master
(master)$ git svn rebase
(master)$ git checkout dev
(dev)$ git rebase master
(dev)$ git checkout master
(master)$ git merge --ff dev # --ff only fast-forwards, merges that don't need to merge.
(master)$ git svn dcommit

Another thing to be aware of is that git svn dcommit creates an extra commit, so even if you haven't changed anything in the master you need to rebase the local branch with the master. This is only needed if you don't delete the branches after you are done with a commit.

In the example above, I ended with a git svn dcommit and I didn't remove the dev branch.

(master)$ git svn dcommit # from above
(master)$ git co dev
(dev)$ git rebase master # rebases the extra commit created by git svn dcommit

If you forget to rebase or something else happens that hinders a clean merge into the master. You can always back out of it with git reset --hard.

(master)$ git svn dcommit
... failed miserably, because I failed to git svn rebase, bollocks!
(aa..88dd|MERGING)$ git reset --hard
(master)$ git svn rebase
(master)$ git svn dcommit # Nice and clean commit

To get started you need to clone a subversion repository.

$ git svn clone http://svn.example.com/project/trunk
$ cd trunk
(master)$ git ...

Now, is a good time to start using Git. Get yourself anything by Scott Chacon, such as the book or the screencasts.

Tuesday, November 24, 2009

Under the Hood of git clone

When you clone a git repository, everything is automatically setup to allow you to fetch, pull, push to and from the remote repository, origin. But what is really going on? git remote is configured with a few lines of configuration in the config file inside the .git/ directory.

Here’s how it works:

Create a new repository, called base, add a file to it, then commit.

$ mkdir base;cd base;git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/base/.git/
$ echo foo > bar.txt
$ git add .
$ git commit -m initial
[master (root-commit) 548d762] initial
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 bar.txt

Clone this repository, called klon.

$ cd ..
$ git clone base klon
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/klon/.git/

Initialize a new repository, called kopy.

$ mkdir kopy;cd kopy;git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/repos/kopy/.git/

The difference in configuration between the klon and the kopy.

$ diff klon/.git/config kopy/.git/config
7,12d6
< [remote "origin"]
<       fetch = +refs/heads/*:refs/remotes/origin/*
<       url = /Users/andersjanmyr/tmp/repos/base
< [branch "master"]
<       remote = origin
<       merge = refs/heads/master

To set up the newly created repository to work the same way the clone does, all I have to do is to edit this file to make it look the same. This is not what git does, so lets do it the git way.

Fixing the remote configuration.

$ cd kopy
$ git remote add origin /Users/andersjanmyr/tmp/repos/base

This adds the [remote "origin"] entry to the config file.

[remote "origin"]
        url = /Users/andersjanmyr/tmp/repos/base
        fetch = +refs/heads/*:refs/remotes/origin/*

Fetching from the origin adds the remote heads to .git/.

$ git fetch
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /Users/andersjanmyr/tmp/repos/base
 * [new branch]      master     -> origin/master
$ find .git/refs
.git/refs/
.git/refs/heads
.git/refs/remotes
.git/refs/remotes/origin
.git/refs/remotes/origin/master
.git/refs/tags

Now I can check out the origin/master, but if I do it the normal way, the configuration will not be set up correctly to allow me to pull and push the way i can with the clone.

Checkout the master version with tracking information.

# DON'T DO THIS, It does not add the tracking information to the config file.
$ git checkout -b master origin/master

# This add the tracking information to the config file.
$ git checkout --track -b master origin/master
Branch master set up to track remote branch master from origin.
Already on 'master'

The following information is added to .git/config when --track is used.

[branch "master"]
        remote = origin
        merge = refs/heads/master

That’s it! Now the .git/config file looks the same as if I had done a normal clone, but lets continue. What do the entries in the config file mean.

Definition of the remote.

[remote "origin"]
        url = /Users/andersjanmyr/tmp/repos/base
        fetch = +refs/heads/*:refs/remotes/origin/*

Definition of the remote as git config commands.

$ git config remote.origin.url /Users/andersjanmyr/tmp/repos/base
$ git config remote.fetch = +refs/heads/*:refs/remotes/origin/*

The first part is just declaring the alias origin for the remote url (or local in this case :).

The second part of the definition is more interesting. It sets up the refspec that will be used if you don’t provide anything on the command line. As we usually don’t provide a full refspec, most people don’t know what it is, this is extremely useful. In case you don’t know, the remote commands of git, push, pull, and fetch take a refspec as their last parameter. It is just that we usually just refer to a small part of it.

Usage of the remote git commands.

git pull <options> <repository> <refspec>...
git fetch <options> <repository> <refspec>...
git push <options> <repository> <refspec>...

The format of a refspec parameter is an optional plus +, followed by the source ref src, followed by a colon :, followed by the destination ref dest.

It defines what dest object should be updated by the src object.

Example definition of refspec.

# The local
# +<src>:<dest>
+refs/heads/spike:refs/remotes/origin/master

In our day-to-day usage of git, we usually don’t use the full syntax of the refspec. Instead we just refer to simple names. Like this.

Day-to-day usage of refspecs.

# Push the local branch to the remote branch with the same name
$ git push origin
# Pull the master into the local master.
$ git pull origin master
# Fetch the master of the origin and put the result in the remote experimental
$ git fetch origin master:refs/remotes/origin/experimental

The above really means:

Definition of the branch, expanded

# Push the local branch to the remote branch with the same name
$ git push refs/heads/*:refs/remotes/origin/*
# Pull the master into the local master.
$ git pull origin refs/heads/master:refs/remotes/origin/master
# Fetch the master of the origin and put the result in the remote experimental
$ git fetch origin refs/heads/master:refs/remotes/origin/experimental

From the above syntax, it is also possible to decrypt the obscure syntax used when deleting a remote branch. Deleting is the same as pushing to a remote branch without giving a local branch.

Delete a remote branch.

# Delete the remote branch serverfix
git push origin :serverfix

Now, we are down to the last part of the configuration, the branch definition.

Definition of the branch

[branch "master"]
        remote = origin
        merge = refs/heads/master

The first part branch.master.remote, tells git to use origin as the default remote, if none is given for this local branch.

The second part tells git which remote branch to use when merging. This also affects pull and fetch. Depending on your settings of push.default, it will also affect push.

Hopefully this has clarified some of the intricacies of git remoting. Just remember that if you make a mistake, you can always fire up an editor and edit the config file directly.

I’ll finish up with some more commands that can be used to get information about the remote.

Additional remote commands, to explore a remote.

# Show all remote branches
$ git branch -r
  origin/cucumber
  origin/customercare-0.6.x
  origin/master

# Show all remotes verbosely
$ git remote -v
origin  /Users/andersjanmyr/tmp/repos/base (fetch)
origin  /Users/andersjanmyr/tmp/repos/base (push)

# Show info about the remote
$ git remote show origin
* remote origin
  Fetch URL: /Users/andersjanmyr/tmp/repos/base
  Push  URL: /Users/andersjanmyr/tmp/repos/base
  HEAD branch: master
  Remote branches:
    experimental stale (use 'git remote prune' to remove)
    master       tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

# List the remote heads of the origin
$ git ls-remote --heads origin
548d7624f5385d36314e8ab61e61e8872c0bfe90        refs/heads/master

That’s it for today.

Thursday, September 24, 2009

Git undo, reset or revert?

If you have found this page you probably came here since you wanted to clear your working directory from all the changes that you have made.

The simple answer is:

# Clear working directory tree from all changes
$ git checkout -f HEAD

This is, however, not the best way to do it. A better way is:

# Clears the working directory tree, and stashes all the changes.
$ git stash

git stash allows you to get your changes back any time you need them in case you change your mind. It is also possible to inspect and manipulate the stashes.

# List all the stashes
$ git stash list
stash@{0}: WIP on admin_ui: 0c1a80a Removed annotation from JdbcAdminService, it is now explicity initialized in the applicationContext.
stash@{1}: WIP on admin_ui: 14e12e6 Added foreign keys for UserRole
stash@{2}: WIP on master: d188ecd Merge branch 'master' of semc-git:customercare
stash@{3}: WIP on master: 3763795 More work on user_details.
...

# Apply the latest stash, and remove it from the stack
$ git stash pop

# Apply a named patch, but leave it on the stack
$ git stash apply stash@{2} 

# Drop a stash
$ git stash drop stash@{3} 

# Clear the entire stash stack (almost never needed)
$ git stash clear

# A better way to purge the stash
$ git reflog expire --expire=30.days refs/stash

What about git reset then, it sounds like it should do about the same as git co -f HEAD. It doesn't. git reset is used for setting the current reference pointer, HEAD.

# Reset the latest commit, and leave the changes in the index.
$ git reset --soft HEAD^

# Reset the latest commit, and leave the changes in the working directory
$ git reset HEAD^

# Undo add, move the changes from the index to the working directory
$ git reset

# Reset the latest successful pull or merge
$ git reset --hard ORIG_HEAD

# Reset the latest failed pull or merge
$ git reset --hard

# Reset the latest pull or merge, into a dirty working tree
$ git reset --merge ORIG_HEAD

You can do more things with reset, but the above covers the typical cases. And now to the last thing, git revert. What does it do? git revert creates a new commit that is the opposite of the commit it names.

# Show the commits
$ git log --oneline
4717a5c new line
7e38e95 added tapir file

# Revert the commit named, 4717a5c, and commit it.
$ git revert 4717a5c

# Revert the HEAD commit, but don't commit it
$ git revert -n HEAD

Git is incredibly flexible and lets you control everything if you want to.

Tuesday, September 22, 2009

Inside Git

This is an exploration into what is going on when I run some basic git commands. We start out by creating a new repository. git/object is the directory where git stores all its objects, and it is empty initially.

$ mkdir myrepo
$ cd myrepo/
$ git init
Initialized empty Git repository in /Users/andersjanmyr/tmp/myrepo/.git/
$ find .git/objects -type f     # find all files in .git/objects
$

When a file is added to git it gets stored in the .git/objects directory under the name of its hash. The first two characters of the hash is used as the name of a subdirectory and the rest become the file name. Worth noting is that the hash uniquely identifies its content, so were you to run the commands on your computer, your results should be identical.

$ echo "A tapir has 14 toes" > tapir.txt
$ git add tapir.txt
$ find .git/objects -type f
.git/objects/12/a93608760777f50380a94b52e1b54ec69f4743
$ git hash-object tapir.txt
12a93608760777f50380a94b52e1b54ec69f4743

If you try to list the contents of the file, you are out of luck since it is stored in a binary format, you should instead use the git command git cat-file. The file above is a blob and its contents is what can be expected.

$ cat .git/objects/12/a93608760777f50380a94b52e1b54ec69f4743
xK??OR02`pT(I,?,R?H,V04Q(?O-?zi$ 
$
$ git cat-file -t 12a93608760777f50380a94b52e1b54ec69f4743
blob
$ git cat-file blob 12a936   # Using the first part of the hash is enough
A tapir has 14 toes

Even though the file is in the .git/objects directory it is not committed yet and it cannot be read by the high-level git commands such as git log. git status on the other hand will show that the file is staged, or in the index.

$ git log
fatal: bad default revision 'HEAD'
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
# new file:   tapir.txt
#

When I commit the file, two more objects are added to the .git/objects directory

$ git commit -m "added tapir file"
[master (root-commit) 7e38e95] added tapir file
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 tapir.txt
$ find .git/objects/ -type f
.git/objects//12/a93608760777f50380a94b52e1b54ec69f4743
.git/objects//7e/38e95d328287ea9d234a2affc4ed9e4510435a
.git/objects//e8/493a7e63154350f8c3d08a42e759132d9d2a39

One is tree object and the other is a commit.

$ git cat-file -t 7e38
commit
$ git cat-file -t e849
tree
$

The commit contains the information that was recorded when I committed. Apart from the commit message and my personal info it contains a reference to the tree object that was created simultaneously with the commit.

$ git cat-file commit  7e38
tree e8493a7e63154350f8c3d08a42e759132d9d2a39
author Anders Janmyr <anders.janmyr@jayway.se> 1253590540 +0200
committer Anders Janmyr <anders.janmyr@jayway.se> 1253590540 +0200

added tapir file
$

The tree object is stored in binary format and cannot be completely read without the help of git ls-tree. Now I can see that it contains a reference to the blob that was created initially, the tapir.txt file.

$ git cat-file tree e8493a7e63154350f8c3d08a42e759132d9d2a39
100644 tapir.txt?vw???KR?NƟGC$ 
$ git ls-tree e8493a7e63154350f8c3d08a42e759132d9d2a39
100644 blob 12a93608760777f50380a94b52e1b54ec69f4743 tapir.txt
$

So how does git know what is the latests commit? In git lingo the latest commit is know as the HEAD. If I look inside .git/HEAD I see a reference and this reference points to the latest commit.

$  cat ./.git/HEAD
ref: refs/heads/master
$ cat ./.git/refs/heads/master
7e38e95d328287ea9d234a2affc4ed9e4510435a

The .git/refs directory is where all the references of git live, heads and tags.

$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/heads/master
.git/refs/tags
$ git branch olle
$ find .git/refs
.git/refs
.git/refs/heads
.git/refs/heads/master
.git/refs/heads/olle
.git/refs/tags

Creating a new branch with git branch shows that the branch is added to the heads directory, switching to it will change the .git/HEAD contents.

$  cat ./.git/HEAD
ref: refs/heads/master
$ git co olle
Switched to branch 'olle'
$  cat ./.git/HEAD
ref: refs/heads/olle

Git, simple, but beautiful!

Sunday, March 08, 2009

Git it on

I switched to Git as my main version control system about five months ago and I haven’t looked back. I’m currently working on a Javascript project where I have to test and develop code both on the Mac and on the Windows VM-Ware images. I have usually had problems with this kind of setup, not so with Git.

Git Basics

I set up my git repository with the usual:

# Create directory
mkdir projectname

# Change to it
cd projectname

# Initialize the repsitory
git init

Then I add a .gitignore file with the specifics for this project.

# Local .gitignore
dist
pkg

And then I add the file to project and commit it.

# Add the .gitignore file
git add .gitignore

# Commit the changes
git commit -m 'initial'

My global ignore file $HOME/.gitignore contains everything that I have common for all projects.

# Global $HOME/.gitignore
*~
.DS_Store
tags
*.gz

Git works with changes, this is important, so you are not actually adding a file to the repository, you are adding the changes. This means that you will have to add files every time you have made a change to them. I like this since it gives me an extra level of control, but if you don’t, you can always use the -a option when you commit and it will include the changes to files that have been added to the repository at least once.

# Add and commit the changes
git commit -a -m 'changes to all added files'

After setting up the initial repository and adding the .gitignore file, I am set to go. I usually start with creating a new branch for development.

# Create the branch but don't switch to it.
git branch dev 

# Or, create the branch and switch to it.
git co -b dev 

# The co above is an alias for checkout, it was created like this
git config --global alias.co "checkout"

After creating, and moving to, this branch I create another branch for the task that I am going to work on. For example

# Create a new branch, setup_tests
git co -b setup_tests

I do what I have to do to get the task done and then I switch to the dev branch and merge the changes in.

# Switch branch to dev
git co dev

# Merge in the changes from setup_tests
git merge setup_tests

If at any point I feel like I need to do something that doesn’t have to do with this task, such as add some utility functions, I just commit the current branch and start a new one from where I am. This gives me very fine-grained control over the source code and it lets me throw away changes that goes bad without parsing and removing the bad changes manually.

If I forget to do my fine-grained branching, Git allows me to just add some of the changes from a file. The simplest way to do this is via the interactive add command.

# Enter git interactive add 
git add -i

Another command I find myself using more and more is stash. If the change that I need to do is totally unrelated to what I am doing now and I just need to fix it, I stash my current branch on the stash-stack and move to the branch where I need the change.

# Saves the changes on the stash stack
git stash

# Switch branch, change, and commit
git co development
git commit -a -m 'urgent change'

#Switch back, and apply the stash.
git co setup_tests
git stash apply

Like I said, the stash is a stack and the command git stash apply applies the top of the stack. If you have the need to apply a stash that is not on top, this is also possible. See the help, git help stash, for information on this.

Cross-VM Workflow

When I need to test my changes on Windows, especially on the awful IE6, I switch to this virtual machine and then I clone the repository.

# Clone the repository
git clone path_to_my_local_mac_directory

Now, I have a perfect copy of my repository and I can test and make the needed changes at will. After this I just push the changes back to the Mac again and pull them back to windows from now on.

# Pushes the current branch
git push 

# Pull the changes back
git pull

I usually find doing this in the development branch the best way to go and then I reserve the merging into the master branch to the repository that I have designated as main, the one on the Mac.

At the moment I am developing on my main repository and there is one caveat with this. If you push changes into this repository the working directory will not be updated. Thus, if you have local changes in your working directory and just commit them without checking the status. You will revert the changes that have been made on the remote repository. This is by no means a fatal problem since you can just revert your changes and move on, but it is still annoying. The lesson to learn is to always use git status before you commit.

# Shows the status of the repository
git status

Parallel Branches

In addition to working on a Project with Javascript, I am also developing a course in it. It is called Javascript, the Esperanto of the Web because that is what Javascript has become. When Java didn’t cut it as the language for the web, Javascript was there and that was all that it took.

When developing this course I have the usual issues with keeping questions in sync with the answers. Git solves this without any problems at all. All I do is to keep to branches of the tree, master, and solution. I do all the course development in the solution-branch and then I just merge it into the master branch where I remove all the correct answers. After that I move back to developing in the solution-branch.

If you want to get started with Git I can highly recommend the series of screencasts at “Gitcasts”: Start with the RailsConf Git Talk, it gives a good overview. If you’re on Windows and like GUI tools move on with the Git on Windows Talk. I use the Cygwin version for my windows version of Git since I have these tools installed anyway.