Getting Started With Git
All of our source code is managed through the use of the Git version control system and it is recommended that all developers familiarise themselves with its use. [ Table of Contents ]
Table of Contents
Why Git?
The main reason that we've decided to use Git is because of GitHub, which — unlike predecessors like Sourceforge and Google Code — recognises that coding is very much a social endeavour and provides a number of features to facilitate this.
The GitHub home page gives a reasonable indication of the goodies that await you:
Git is a distributed version control system (DVCS) and allows for decentralised development without much hassle. Not only does this reflect the Espian ethos, but it paves the way for an eventual Ampify-based DVCS system.
Git also outshines centralised version control system like Subversion in terms of performance. This is mainly due to the fact that Git doesn't have to make needless network requests since it has an entire copy of the repository available locally, e.g.
$ time git log > /dev/null
0.352s
$ time svn log > /dev/null
3.709s
Git is space efficient too. For example, the Git clone of the Django project is smaller than a Subversion checkout!! This is surprising because the Git clone has the entire repository history with every revision, whilst the Subversion checkout is only of a single revision!
$ du -d 1 -h
108M ./django-bzr
44M ./django-git
53M ./django-hg
53M ./django-svn
As for other competing DVCS systems, Mercurial has the nice feature of being coded entirely in Python and being very extendable.
Git and Mercurial are very similar — in fact their object models are so identical that there's a hg-git mercurial plugin. However, besides lacking a decent GitHub equivalent, Mercurial also currently lacks two really nice features of Git:
- Super cheap/quick branching
- Rebasing
Eventually Mercurial may fix its broken branching model, but for now Git shines amongst the current flock of DVCS. And GitHub, with its social focus, makes it a true pleasure to use.
Having said all this, Git's user interface sucks — the worst of it are the appalling error messages it gives you. Thankfully this has gotten better in the 1.6.x version and hopefully will continue to get better with time…
First Steps
If you don't have an account already, go sign up on GitHub. It only takes a minute and they provide a very generous free account. Whilst a pain, please also add a profile image for the email address you signed up with using GitHub's sister service Gravatar. And, finally, upload your SSH key to GitHub and you're all set to go!
The next step is to install Git if you don't already have it. It is RECOMMENDED that you install version 1.7+ of Git as its user interface is much much better:
Then tell Git who you are — this is used in commit messages, so provide the real info, e.g.
$ git config --global user.name "tav"
$ git config --global user.email tav@espians.com
Behind the scenes this will update your ~/.gitconfig file. You might also want to extend this file with something like the following:
[user]
email = tav@espians.com
name = tav
[color]
diff = auto
status = auto
branch = auto
interactive = auto
ui = auto
[github]
user = tav
token = 0cc175b9c0f1b6a831c399e269772661
[alias]
ch = checkout
co = commit
st = status
lp = log -p
diffall = diff HEAD
diffstaged = diff --staged
graph = log --date=\"short\" --format=\"%C(yellow)%h%Creset [%cN] %C(white)%ad%Creset %s\" --graph
filelog = log --oneline --no-merges --
unstage = reset HEAD
[push]
default = matching
[core]
webKitBranchBuild = true
[ampify]
branch-build = true
The [color] section provides colour output when you run various git commands which can be quite useful.
The [alias] section allows you to alias shortcuts, e.g. with the above alias definition, you can now just run git graph instead of the full git log --date="short" --format="%C(yellow)%h%Creset [%cN] %C(white)%ad%Creset %s" --graph. In combination with the Git bash completion script, aliases will make life much easier for your fingers.
The [github] section is used by special GitHub powered tools — you can find your token on your GitHub account page. One GitHub powered tool that's super cool is Gist — the best pastebin out there, with all of your pastes automatically becoming Git repositories! Gist also has a number of command line and editor interfaces:
- Gist command-line (Ruby version)
- Gist command-line (Python version)
- Emacs M gist-buffer@ support
- Vim :Gist support
A core.autocrlf = true setting would tell Git to convert newlines to the system’s standard when checking out files, and to LF newlines when committing in changes:
$ git config core.autocrlf true
And, finally, the core.webKitBranchBuild = true and ampify.branch-build = true respectively tells the various WebKit and Ampify build scripts to append the name of the git branch you are in to the build directory. This is especially useful so you don't clobber your previous branch's build when you switch branches.
You can further configure a local Git repository using git config <param> <value>, e.g.
$ git config branch.autosetuprebase always
This will update a .git/config file inside a local repository. If you want to set the configuration globally, i.e. in your ~/.gitconfig file, you can pass the optional --global flag, e.g.
$ git config --global branch.autosetuprebase always
Cloning
One usually starts off by cloning a Git repository. GitHub provides a range of different access methods (transport protocols) which support either read-only or read/write access. For example, if we take the Ampify repository, some of the access URIs are:
https://github.com/tav/ampify.git # read-only
git://github.com/tav/ampify.git # read-only
git@github.com:tav/ampify.git # read/write
For “anonymous” read-only clones, it's recommended that you use the HTTP/HTTPS smart transport as long as you have a recent 1.7+ version of Git, e.g.
$ git clone https://github.com/tav/ampify.git
If you have read/write access, you should instead clone the repository by running:
$ git clone git@github.com:tav/ampify.git
Running either of the above commands will clone the repository into a newly created ampify directory inside whatever directory you executed the command in, e.g.
Initialized empty Git repository in /Users/tav/ampify/.git/
remote: Counting objects: 7778, done.
remote: Compressing objects: 100% (5837/5837), done.
Receiving objects: 7% (545/7778), 1.47 MiB | 366 KiB/s
Once the command has finished running, you'll find that there's a fresh “checkout” of the repository inside the ampify directory. You'll also find a special .git directory which actually contains the repository (config, objects and metadata).
Of special note is the .git/config file which will look something like:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
url = git@github.com:tav/ampify.git
[branch "master"]
remote = origin
merge = refs/heads/master
What Git has effectively done is:
- Grab a copy of the repository.
- Setup a new remote reference called origin which points to the original repository.
- Linked your local master branch to the master branch on origin (called “tracking”).
Note: Having a remote named “origin” and a branch named “master” is just the default Git convention — you can name your remotes and branches however you want.
Remotes
Remotes are just references to other repositories. In Git you are always operating and committing on your local repository. Interaction with remote repositories is generally limited to simply fetching/pushing changes.
And since these remotes are in relation to just your local repository, you can set up as many of these as is relevant without affecting anyone else's setup. Now you might wonder why you need multiple remotes to be configured. For this, let's take a look at the webkit repository.
Our webkit repository is actually a fork of:
Just like other developers would have to with our respositories, we had to fork appcelerator/webkit_titanium since we only have read-only access to it. But when we run the following command to clone our repository it doesn't come with any references to the original Appcelerator repository:
$ git clone git@github.com:tav/webkit.git
The .git/config in the newly created repository directory would have something like the following in it:
[remote "origin"]
url = git@github.com:tav/webkit.git
fetch = +refs/heads/*:refs/remotes/origin/*
Now if a developer wants to keep an eye on changes that Appcelerator make — and possibly bring in some of those changes into our repository — they should setup a new remote for it on their local repository.
So let's call this new remote upstream and point it at the read-only reference for the Appcelerator repository:
$ git remote add upstream https://github.com/appcelerator/webkit_titanium.git
The .git/config would now have a few additional lines:
[remote "upstream"]
url = https://github.com/appcelerator/webkit_titanium.git
fetch = +refs/heads/*:refs/remotes/upstream/*
You can see a list of remotes you've setup by running:
$ git remote show
origin
upstream
And get further detail on any of them using:
$ git remote show origin
* remote origin
Fetch URL: git@github.com:tav/webkit.git
Push URL: git@github.com:tav/webkit.git
HEAD branch: master
Remote branches:
master tracked
pypy-integration tracked
Local branch configured for 'git pull':
master merges with remote master
Local refs configured for 'git push':
master pushes to master (up to date)
pypy-integration pushes to pypy-integration (up to date)
The previous command does a remote query over the network, so you can pass an additional -n parameter to just use the cached data instead, i.e.
$ git remote show -n origin
And, finally, to keep your copies of all of your remote repositories updated, simply run:
$ git remote update
Updating origin
remote: Counting objects: 73, done.
remote: Compressing objects: 100% (59/59), done.
remote: Total 64 (delta 38), reused 0 (delta 0)
Unpacking objects: 100% (64/64), done.
From git@github.com:tav/webkit
7e329e4..d262952 master -> origin/master
Updating upstream
From https://github.com/appcelerator/webkit_titanium
* [new branch] master -> upstream/master
* [new branch] titanium_1.0 -> upstream/titanium_1.0
* [new branch] titanium_1.0_win32_osx -> upstream/titanium_1.0_win32_osx
* [new branch] titanium_pr4 -> upstream/titanium_pr4
Basic Workflow
Whist remotes are important for sharing changes with other developers, it's git add and git commit that you'll be using to make changes to your local repository.
Unlike in Subversion, where you do svn add only once to add files to the repository, Git distinguishes your working directory from what it calls the “staging area” or “index file”.
So your general workflow would be to edit away in the working directory/tree -→ add the changes you like to the staging area -→ commit the changes.
For example, let's say you started a few edits — of 2 files which were already in the repository and one which was a brand new file:
$ edit documentation/credits.txt # already in the repository
$ edit documentation/install.txt # already in the repository
$ edit documentation/tests-guide.txt # not yet in the repository
Then to see what's changed:
$ git status
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: documentation/credits.txt
# modified: documentation/install.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# documentation/tests-guide.txt
no changes added to commit (use "git add" and/or "git commit -a")
You can then add just the credits.txt file to staging area:
$ git add documentation/credits.txt
$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: documentation/credits.txt
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: documentation/install.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# documentation/tests-guide.txt
You could “unstage” the file from the staging area by using:
$ git reset HEAD documentation/credits.txt
But normally you'd just follow the git add with an informative commit:
$ git commit -m "Added myself (tav) to the credits list."
[master 593ff32] Added myself to the credits list.
1 files changed, 5 insertions(+), 0 deletions(-)
This gives you a brief overview of the commit along with 593ff32 — a shortened version of the full 40-character hexadecimal SHA-1 identifier for the commit/revision. You can use this in many places, e.g.
$ git show 593ff32
commit 593ff32c3d865ab0787f21e0b86eed4752ef063a
Author: tav <tav@espians.com>
Date: Thu Jul 9 23:22:00 2009 +0100
Added myself to the credits list.
diff --git a/documentation/credits.txt b/documentation/credits.txt
index fa2ce46..b94e35f 100644
--- a/documentation/credits.txt
+++ b/documentation/credits.txt
@@ -8,6 +8,11 @@ Credits
+* **tav**
+
+ * tav
+ * tav@espians.com
+
*
You don't always need to explicitly re-add files which are already in the repository to a staging area. If you are happy to commit all changes to the files that are already in the repository, you can just pass the -a flag to git commit:
$ git commit -am "Updated the install instructioned for OS X."
[master ab408af] Updated the install instructioned for OS X.
1 files changed, 2 insertions(+), 0 deletions(-)
In the above example this would've only added and committed install.txt since tests-guide.txt wasn't already known to the repository. You'd need to add it to the repository using the normal:
$ git add documentation/tests-guide.txt
Or if you wanted to add all untracked files to the index (staging area), you could just run:
$ git add .
This will add all files underneath the current directory which wasn't excluded by any patterns in any .gitignore files set in any parent directories — leading all the way up to the root working directory (i.e. the directory with the .git subdirectory).
We have relevant .gitignore files set in all of our repositories, but a decent general one to use is:
# general hidden files/directories
.DS_Store
.sconsign*
.svn
# file patterns
*.dylib
*.la
*.lo
*.o
*.pyc
*.pyo
*.so
*.tar.gz
*.tar.bz2
*~
# file patterns (xcode)
*.mode1v3
*.mode2v3
*~.nib
*.pbxuser
*.perspective
*.perspectivev3
*.swp
*.tm_build_errors
# file patterns (windows)
*.dll
*.exe
*.ilk
*.lib
*.ncb
*.pdb
*.suo
*.vcproj.*.*.user
In contrast to adding files, you can of course remove files from the project by running git rm <files>. This will remove the physical file(s) from your working tree as well as add a deletion to your staging area for the next commit.
If you've manually deleted a bunch of files already but not told Git about it, you can run git rm $(git ls-files --deleted) to remove all the files which you've deleted but are still referred to in the project.
And for those instances when you don't want to delete the physical file, but do want to remove it from the project, you can run git rm --cached <files>, e.g.
$ ls *.txt
README.txt
$ git rm --cached README.txt
rm 'README.txt'
$ git-status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: README.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# README.txt
$ git commit -m "Removed the pointless README.txt file."
[master 85a2f0b] Removed the pointless README.txt file.
1 files changed, 0 insertions(+), 38 deletions(-)
delete mode 100644 README.txt
$ ls *.txt
README.txt
As you can see, with git rm --cached, the physical file is still there — just not in the project… whilst with a plain git rm it'd have been deleted from the working tree too.
And at any point you could have checked the exact changes in files by using:
$ git diff
By default this shows the changes in your working tree which have not yet been staged for the next commit. If you want to see changes between the staging area and your last commit, i.e. what you would be committing if you ran git commit without the -a option:
$ git diff --staged
And to see all changes in your working tree since your last commit, i.e. what you would be committing if you ran git commit with the -a option:
$ git diff HEAD
The beauty of the separate staging area is that cool features like git add -p are possible which allows you to choose to add only specific hunks from your changes in a file to the staging area in order to commit!
Revision Identifiers
Earlier we saw that each commit has a unique identifier. The identifier in the example was 593ff32c3d865ab0787f21e0b86eed4752ef063a. You can use any unique prefix of the identifier in various commands. For most reasonably-sized repositories, the first 7 characters, e.g. 593ff32, are good enough to be unique within the repository.
These identifiers are permanent and will always point to the same commit. Git also provides support for more transient identifiers in the way of “branches” and “tags”. Collectively we can refer to all of these identifiers as “revision identifiers” and denote them as <rev>.
In addition to commit identifiers, branches and tags, there are also a few special identifiers:
- HEAD
- FETCH_HEAD
- ORIG_HEAD
- MERGE_HEAD
Of these the most important is HEAD which points to the commit your working tree is currently based on. This can point to either a commit identifier or a branch. For example, in this case it's pointing to the local master branch which in turn would point to a commit identifier:
$ cat .git/HEAD
ref: refs/heads/master
You can refer to the “parent” of a commit, by appending a ^ to the revision identifier, e.g. HEAD^ or master^ or 593ff32^. You can add as many ^ as you want to refer to the parent of the parent and so on. For example, HEAD^^^ would refer to the great-grandparent of the commit you're currently working from.
In most commands where a revision identifier is expected and one is not given, the command will generally implicitly assume that you meant HEAD. For example, if you just ran git show without any specific revision identifier, it'd assume that you meant git show HEAD.
There is a lot of other synax related to revision identifiers and you can find out more by running git help rev-parse.
Branches
Now the branches and tags introduced earlier are simply just named pointers to permanent commit identifiers. Tags are generally expected to be stable pointers — used to indicate a specific public “version” of a project — whilst branches tend to evolve over time and point to different commit identifiers.
The latest committed version of a branch is referred to as its “head” (not to be confused with the special HEAD) and you can find a file in .git/refs/heads/<branch-name> (for local branches) or .git/refs/remotes/<branch-name> (for remote branches) which will contain the commit identifier that the branch is currently pointing to.
You can see a list of local branches by running:
$ git branch
* master
Since this is a fresh clone, nothing's been done locally, so only the default master branch has been created. You can see the remote branches you've fetched using:
$ git branch -r
origin/HEAD -> origin/master
origin/master
origin/titanium_1.0
origin/titanium_1.0_win32_osx
origin/titanium_pr4
upstream/master
upstream/titanium_1.0
upstream/titanium_1.0_win32_osx
upstream/titanium_pr4
You can even see a list of all branches (locally and in remotes) by using git branch -a. Of course knowing what branches are available is only of so much use. You are more likely interested in switching to a branch, for which you can use git checkout <branch-name>, e.g.
$ git checkout origin/titanium_1.0
Note: moving to 'origin/titanium_1.0' which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
git checkout -b <new_branch_name>
HEAD is now at cb03eaa... Spaces -> Tabs
Or something like:
$ git checkout origin/titanium_1.0_win32_osx
Previous HEAD position was cb03eaa... Spaces -> Tabs
HEAD is now at 5cbb8e9... readding build script for osx/windows
Or something like:
$ git checkout origin/master
Checking out files: 100% (2095/2095), done.
Previous HEAD position was 5cbb8e9... readding build script for osx/windows
HEAD is now at ba9014e... WebCore:
You can go back to the last branch you were on using:
$ git checkout -
Since you'd “checked out” remote branches, Git would have updated your working tree with the state of the latest commit in the remote branch, set your HEAD to the commit identifier and set your local branch reference to “no branch”:
$ git branch
* (no branch)
master
The * indicates which local branch (if any) you are currently on. You can branch off from a specific revision into a new branch at any time by using git checkout -b <new-branch-name> <rev>. If no revision identifier is specified, then the implicit HEAD (where you currently are) is used:
$ git checkout -b testing
Switched to a new branch 'testing'
$ git branch
master
* testing
You could also use the alternative syntax of git branch <new-branch-name> <ref>, but in that case Git doesn't “switch” to the branch, i.e. update your working tree. It will simply create a pointer for the new branch name to point to the commit idenifier for the revision and leave it at that.
You can delete local branches using git branch -d <branch-name>. You can't use this to delete branches you are currently on or for branches which are not “reachable” from your current branch. For that you'd need to use git branch -D <branch-name> — DO NOT use this unless you know what you're doing.
Similarly, you can delete remote branches using git push <remote> :<branch-name>, e.g.
$ git push origin :titanium_1.0_win32_osx
You can always see which remote branches your local branches are set to track using:
$ git branch -v -v
master ba9014e [origin/master] WebCore:
* testing ba9014e WebCore:
Anytime you are working in a local named branch and do a commit, Git will automatically update the pointer for the branch to the new commit identifier. Branching is super cheap and quick in Git — take advantage of it!
Tags
Tags are similar and should be used to denote public-facing versions or releases. You can create a annotated tag (i.e. a tag with a message/note) using git tag -a <new-tag-name> <ref>:
$ git tag -a ampify-0.1.3.2 -m "Alpha Release 0.1.3.2 of the Ampify."
Or perhaps:
$ git tag -a milestone-base-layout 4a7f7f2092 # will prompt you for a message
And you can list the various tags using:
$ git tag -l
milestone-base-layout
ampify-0.1.3.2
Or specific tags matching a certain pattern, e.g.
$ git tag -l ampify*
ampify-0.1.3.2
And you can “switch” to a tag just like with remote branches, using git checkout:
$ git checkout ampify-0.1.3.2
Push
Your commits, branches and tags are “private” by default. That is, they stay in just your local repository until you explicitly git push them to a remote repository.
To do this you can use the explicit command form git push <remote> <name-of-a-local-branch> to update the remote with commits made to the local branch since your last push, e.g.
$ git push origin master
Counting objects: 9, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 5.49 KiB, done.
Total 5 (delta 3), reused 0 (delta 0)
To git@github.com:tav/ampify.git
ba7f069..912ce7a master -> master
For new branches you MUST use the above form for the very first time you push to a remote repository. After the first explicit push you can just do a plain git push. To further control what happens when you do a plain git push, you can configure push.default to one of:
- matching — push all commits across all matching branches, i.e. branches that are common between the local and remote repositories (this is the default behaviour).
- nothing — do not push anything.
- tracking — push the current branch to whatever it is tracking.
- current — push the current branch.
You can even use git push <remote> to push to a specific remote repository:
$ git push origin
If GitHub is down for some reason, you might see an error like:
Could not chdir to home directory /data/git: No such file or directory
bash: gerve: command not found
fatal: The remote end hung up unexpectedly
The best thing to do in such a case is to alert #github (on irc://irc.frenode.net) about it and then wait… In other times you might see an error like:
! [rejected] master -> master (non-fast forward)
error: failed to push some refs to 'git@github.com:tav/ampify.git'
There are a number of possible causes for this. The first is simply that someone else has made a commit since you last updated your local repository with changes from the remote repository. This can result in what git refers to as “non-fast forward”.
The term “fast forward” simply means that changes in a branch can be advanced along a linear sequence. That is, there was never any divergence or simultaneous commits created in parallel in multiple repositories.
You can therefore normally fix non-fast foward errors on pushes by simply fetching and merging changes from the remote repository before pushing your changes, i.e. git pull before you git push again.
But you might also get the above error if you've rewritten a branch history in some way, e.g. by rebasing or by using git reset to change the branch head to something other than the equivalent on the remote repository.
So, for example, if your revisions look something like:
A--B--C <-- master (on the local repository)
/
o--o--X
\
Y-Z <-- remotes/origin/master
You can then rewrite it using git rebase <branch-name>, e.g.
$ git rebase origin/master
Which would try to turn the changes back into a linear progression, e.g.
A'--B'--C' <-- master (on the local repository)
/
o--o--X--Y--Z <-- remotes/origin/master
And, finally, your tags are never pushed by default. To push your tags to a remote repository, you'd need to add the --tags parameter:
$ git push --tags
Pull
The flip side to a git push is of course a git pull. This grabs changes from remote repositories and merges them into the current branch. This is dependent on a few lines being in the .git/config file for the given branch which specifies which remote “tracking” branch should be merged automatically.
So, if we were on the local “testing” branch and if the following lines where in your .git/config, Git would fetch changes from the “origin” remote repository and merge in changes from the remote “master” branch:
[branch "testing"]
remote = origin
merge = refs/heads/master
Git sets this up automatically for the local “master” branch when you clone a remote repository. It also adds those lines automatically whenever you create new branches using git branch or git checkout -b if you'd configured branch.autosetupmerge = true.
Otherwise you need to add these lines manually or create new branches using the command git branch --track <new-branch-name> <remote-branch-name>, e.g.
$ git branch --track testing origin/master
You can also create and checkout a tracking branch using git checkout --track <remote-branch-name> which is an optimised form of git checkout --track -b <new-branch-name> <remote-branch-name>.
Now whilst it's nice that git pull automatically does all this for you, it's often better to simply do manually what git pull is doing for you automatically. That is, to do git fetch followed by a git merge.
A fetch command updates your copy of a remote repository with all the objects necessary to access all of the branches and heads in the remote repository. The command is of the form git fetch <remote> — and by default it uses “origin” if no remote was specified:
$ git fetch -v
From github.com:tav/webkit
= [up to date] master -> origin/master
= [up to date] pypy-integration -> origin/pypy-integration
But even better than doing several git fetch commands is to run the following which updates your copies of all of your remote repositories, i.e. equivalent to git fetch origin && git fetch upstream for our previous example:
$ git remote update
Updating origin
Updating upstream
You can also prune any stale tracking branches you might have whilst doing a remote update using, git remote update --prune <remote>, e.g.
$ git remote update --prune origin
Doing a manual git merge is also quite easy. A command of the form git merge <rev> will merge any changes from the revision identifier, usually a remote branch, into your current branch, e.g.
$ git merge origin/master
If there were changes and they were “fast forwardable” (i.e. linear), Git will apply them for you cleanly. Otherwise Git would automatically create a new commit — called a “merge commit” — which would commit the changes caused by the merge operation.
If for some reason Git couldn't automatically merge the changes, it'll notify you of the various conflicts and leave it up to you to resolve the conflicts — the conflicted files will have special conflict markers which you'd need to decide between — before doing a manual merge commit yourself.
Instead of manually resolving conflicts in a file, if you know that you just want to use your version of a file during a merge, you can “resolve” it using:
$ git checkout --ours path/to/some/file
Or you can use the file from the merged in revision using:
$ git checkout --theirs path/to/some/other/file
Behind the scnes this makes use of the ORIG_HEAD and MERGE_HEAD revision identifiers which are created by commands like merge and respectively refer to the original position of the HEAD before the command was executed and the revisions you are merging.
You can always go back to the initial conflicted state using:
$ git checkout -m
You can list branches that have already been merged/not-merged into the current branch using:
$ git branch --merged
$ git branch --no-merged
You can pass an optional --no-commit parameter to tell Git to not autocommit the resulting merge commit — allowing you to do sanity checks on the changes being applied:
$ git merge origin/master --no-commit
A git merge might sometimes fail with the following rather bizarre error message:
$ git merge origin/master
some/file: needs update
some/other/file: needs update
fatal: Entry 'some/file' not uptodate. Cannot merge.
Merge with strategy recursive failed.
What this means is that the merge operation is trying to apply changes to a file in your local working tree, however that file has changes which your local git repository doesn't know about — because you haven't committed the changes locally!!
The simple solution is just to commit the changes in your working tree to your local repository before re-running the merge command. But if you don't want to commit the changes, but still want to keep your changes around for some purpose, you can use the git stash command to “stash” anything not in your HEAD.
Stash
You stash your working tree changes using git stash save [<stash-name>], e.g.
$ git stash save
You can apply the changes from a stash (by default the most recent) to your current working tree using git stash apply [<stash-name>] or you can apply and remove the stash with the command git stash pop [<stash-name>].
You can list all your stashes with git stash list and show the contents of specific stashes with git stash show <stash-name>. The show subcommand accepts git diff format arguments, so to show the stash in patch form you can run git stash show -p <stash-name>.
And, finally, you can delete all of your stashes with:
$ git stash clear
Rebase
Instead of merging in changes on top of your local commits, you might occassionally want to apply your changes on top of updates you've received instead. This is called rebasing — it effectively “forward-ports” your local commits to the updated upstream head.
The nice thing about rebasing as opposed to merging is that it is often cleaner, keeps the history linear and avoids all the superfluous merge commits. The downside is that it can be dangerous (i.e. break everything) if you've already shared (i.e. pushed) the local commits.
But as long as you follow the First Rule of Rebasing, everythng will be fine:
DO NOT rebase commits that you have already PUSHED
It is also important to note that a rebase will perform an automatic git checkout <destination-branch> if a destination is specified, otherwise the current branch would be used.
So, if we were on the “testing” branch whose revision history looked like:
A---B---C testing
/
D---E---F---G master
Then running either:
$ git rebase master
Or:
$ git rebase master testing
Would change the revision history to become:
A'--B'--C' testing
/
D---E---F---G master
If the source branch, “master” in the above example, already contained changes you'd made (perhaps you'd sent a patch which had gotten applied upstream) then that commit will be skipped.
You can use the --onto version of git rebase to “transplant” a sub-branch of one branch to another. For example, if the revision history looked like:
o---o---o---o---o master
\
o---o---o---o---o testing
\
o---o---o maintests
Then you can run:
$ git rebase --onto master testing maintests
To get:
o---o---o---o---o master
| \
| o'--o'--o' maintests
\
o---o---o---o---o testing
Similarly, you can also use rebase to remove certain commits, e.g.
$ git rebase --onto testing~5 testing~3 testing
Will remove commits B and C from:
A---B---C---D---E---F testing
To form:
A---D'---E'---F' testing
You can tell Git to perform a rebase instead of a merge when you do a git pull with:
$ git pull --rebase
You can configure branch.autosetuprebase = always to automatically configure branch.<name>.rebase whenever you create a new branch with git branch or git checkout -b so that git pull will rebase instead of merging:
$ git config --global branch.autosetuprebase always
You can override this behaviour with:
$ git pull --no-rebase
Revision Info
There are 2 commands which will become you best friends when you start to dig deeper into your repositories and want to know more about revision info: git diff and git log. They both take an optional <rev> parameter which defaults to HEAD and can be any form of identifier mentioned above: commit, branch, tag, etc.
But perhaps the most useful revision identifier is of the form <rev>..<rev> — for example master..testing will incorporate all the revisions since “testing” diverged from “master”. If you leave out a revision identifier on either side, then HEAD is implicitly assumed, e.g. master.. is equivalent to master..HEAD and ..master is equivalent to HEAD..master.
The git log command shows commit logs:
$ git log
commit aeb0fbcb02ea0a6e0cba683cfb8364661e69b83e
Author: tav <tav@espians.com>
Date: Tue Jul 7 21:30:40 2009 +0100
Fixed line-endings and removed extraneous }s and it now compiles!!
You can tweak the command with a lot of optional parameters, but the most useful are git log -p — which shows you the relevant diffs for the commits as well — and the following, which shows you the commit log in a compressed form along with a small graph of the revision relations:
$ git log --oneline --graph
* aeb0fbc Fixed line-endings and removed extraneous }s and it now compiles!!
* 20db6fe Merging in changes since I'd reset to a non-HEAD ..
|\
| * 72e060f Added support to get PyPy bindings building in the xcode project.
| * 43285ab Adding references to PyPy.cpp/h to the WebCore Xcode project.
| * 21a2a65 Adding our default .gitignore to the Webkit repository.
* | 8ac5155 Some basic mods to get the damn thing building.
|/
* 0a1dfdc Added PyPy bindings.
You can see the commit logs between two branches, e.g.
$ git log master..testing --oneline
You can even review the changes in specific files since a certain point, e.g.
$ git log -p ORIG_HEAD.. some/file some/other/file
Or if you just want to just see the diffs between two branches, you can for example do:
$ git diff master..testing
And you can always just get a listing of just the changes by passing the optional parameters --stat, --name-only or --name-status to the diff command. Similarly, you can pass in a number to the optional --unified=<number-of-lines> to specify how many lines of context you want around the diff changes.
Undoing
If you want to revert any changes you've made to a file in your working tree and want to go back to the version in the repository, the git checkout command which you've seen be used to switch to (and sometimes even create) branches, can also be used to re-checkout file(s) from a revision into your current working tree.
The Git equivalent of svn revert is of the form git checkout <rev> -- <files>. If the filename(s) don't conflict with any <rev> revision identifier, then you can leave out the additional --. You can also generally leave out the revision identifier as it defaults to HEAD. So to re-checkout all files, you can run the expected:
$ git checkout .
The git revert <rev> command in contrast behaves quite differently from other version control systems. It “reverts” a commit by applying an inverse patch as an additional commit.
The third way you will undo changes in Git is using git reset [--soft|--hard|--merge|--mixed] <rev>:
- --mixed (default) — resets the index (staging area) but leaves the working tree as it is.
- --soft — changes HEAD to point to the given revision, but does not touch the index file or the working tree.
- --hard — resets the current HEAD to the specified revision and updates everything (including your index and working tree) to match. All changes to tracked files in your working tree would be LOST.
- --merge — resets the HEAD and index, but w.r.t your working tree it works similar to the way “git checkout” switches branches, i.e. it takes your local changes while switching to another revision.
Note: running the following can be dangerous:
$ git reset --hard
It will abandon all changes since your last commit. However it can be useful in instances when you just want to wipe away all changes you've just been making or if you'd like to forget about the merge you just did which has resulted in conflicts.
On the other hand, if your merge was successful, and you'd still like to undo it and any other changes you've done since, you can do:
$ git reset --hard ORIG_HEAD
Resetting to alter history is definitely NOT ADVISED unless you know what you're doing, e.g.
$ git reset HEAD~3
The above command would make the last 3 commits disappear. DO NOT do this if it involves resetting commits that you've already shared with anyone, i.e. pushed somewhere!!
You can easily fix something in your last commit using the following command. It will undo your last commit but keep the changes in the staging area:
$ git reset --soft HEAD^
Submodules
Git maintains references to other repositories within a repository using what it calls “submodules”. This is similar to svn:externals except Git submodules can only point to a specific commit identifier.
References to submodules are stored in a special .gitmodules file at the root of the repository. For example, our Ampify repository has a reference to our Redis repository:
$ cat ampify/.gitmodules
[submodule "third_party/redis"]
path = third_party/redis
url = https://github.com/tav/redis.git
The specified path third_party/redis will be an empty directory until you initialise it using:
$ git submodule init
All this does is update your .git/config file with the info it finds in the .gitmodules file:
$ tail -2 ampify/.git/config
[submodule "third_party/redis"]
url = https://github.com/tav/redis.git
You can then modify this file to use a different repository url. For example if you had read/write access to the Redis repository you can change it to:
[submodule "third_party/redis"]
url = git@github.com:tav/redis.git
And then to clone the missing submodules and checkout the commit specified in the index of the containing repository, do:
$ git submodule update
You can combine both of the above commands into one using:
$ git submodule update --init
You'll now see a full repository checked out under third_party/redis with its own .git subdirectory and everything. You can now make changes to this repository just like you would in any normal Git repository.
You can execute commands on all of the submodules of a repository using:
$ git submodule foreach <command>
And you can add new submodules (i.e. checkout a repository and add a reference to the containing repository's .gitmodules file) using git submodule add <remote-repository-url> <path/to/submodule>. This will add the submodule with a reference to the current commit at the path to your staging area, ready for your next commit.
You can add a specific branch by using the optional -b <branch-name> parameter.
If you want to update the commit reference to an existing submodule, then simply update the submodule repository to the state you want to refer to and then do a git add:
$ git add third_party/redis
$ git-diff --staged
diff --git a/third_party/redis b/third_party/redis
index cb03eaa..d262952 160000
--- a/third_party/redis
+++ b/third_party/redis
@@ -1 +1 @@
-Subproject commit cb03eaa72b885500cde35952de93bbf1b831af3f
+Subproject commit d2629522d30b737c0efa5ddcc445339513f6ce33
Note: DO NOT put a trailing slash after the submodule name, e.g.
$ git add third_party/redis/
Git will then think that you want to delete the submodule and add all the files in the directory instead! Leave out any trailing slashes and everything will be fine:
$ git add third_party/redis
Also, make sure to have PUSHED ANY COMMITS you made in the submodule repository before updating the containing repository's references. Otherwise you'd be referring to commit(s) that no-one else knows about and the repository will therefore be broken for everyone else.
And, finally, if for some reason, the .gitmodules file updates it's submodule reference URIs, you can update the references in your checked out submodule directories using:
$ git submodule sync
Less Commonly Used Commands
Git has dozens of commands with lots of options, so it'll take anyone a little while to get familiar with it all. This section describes some of the less commonly used commands that are useful nevertheless.
You can use git grep to do searches on files in your repository. The normal usecase is to use it as git grep <pattern> <rev>, but you can also use it to search your index and/or specific paths using git grep --cached <pattern> -- <paths>.
You can apply the unified diffs created by git diff using git apply <patchfiles>, e.g.
$ git apply makefile.patch
You can auto-generate patchfiles for e-mail submission using git format-patch <rev> which will create appropriate files in your working directory.
Sometimes one messes up with a commit message. Git lets you fix this by simply running git commit --amend which will allow you to edit the commit message of your most recent commit. You SHOULD NOT do this if you've pushed your commit to a remote repository.
If you are interested in who authored each line in a file, you can use git blame <file> or even git blame <file> <rev> if you are interested in seeing the blame for the file in a particular revision, e.g.
$ git blame README.txt
^c406460 (tav 2009-01-21 06:59:34 +0000 33) All of the work, except for ..
02b5ca49 (tav 2009-06-06 11:15:38 +0100 34) placed into the Public Domain ..
02b5ca49 (tav 2009-06-06 11:15:38 +0100 35) Public Domain Dedication ..
Or perhaps if you wanted to selectively merge a single commit from another branch into the current branch, you could use git cherry-pick <rev>, e.g.
$ git cherry-pick 036da1f
You should occassionally “garbage collect” and “pack” the objects/refs in your Git repository using git gc. Newer versions of Git automatically do this for you every once in a while, so you shouldn't have to explicitly call it anymore. In some extreme cases Git can fix corruptions of object references if you run:
$ git repack -a -f
And, finally, the shortcut git clone --mirror sets up a bare repository mirror.
Additional Resources
Hopefully this article has been useful in getting you up and running with Git. If you want to find out more, the best resource is generally the excellent man pages which come with Git. You can access these by running git help <command>, e.g.
$ git help diff
The Git Release Notes are also a very good source of info. If neither of those help, then you might find the following web guides useful:
- http://gitref.org/
- https://github.com/guides/home
- http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
- http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
- http://book.git-scm.com/
- http://cworth.org/hgbook-git/tour/
- http://git.or.cz/gitwiki/GitFaq
- http://git.or.cz/course/svn.html
- http://www.spheredev.org/wiki/Git_for_the_lazy
- http://www-cs-students.stanford.edu/~blynn/gitmagic/
These cheatsheets in particular are rather invaluable:
You can also find a bunch of useful user generated tips and screencasts on these sites:
You might also find that it really helps with understanding Git if you look at it as just sugar coating on top of a directed acyclic graph.
And, finally, of course, there's good old Google to fall back upon. Good luck and happy cloning!
Credits
The Git workflow image, the commits graph and Django repository size comparison info were all taken from the excellent open source http://learn.github.com site.




