Git Commit Dag
Git is a powerful version control system, but the mental model can be complex. Beginners can find such concepts as fast-forward merges confusion, and sooner or later everyone ends up in the dreaded "detached head" mode. This article will build up a mental model of Git to understand what's really happening, ensuring that you're never trapped, and allowing you to perform impressive feats of Git surgery with aplomb.
Git commits as a DAG🔗
The fundamental object that we'll be talking about is a commit1. A commit is the current state of the file tree, along with some metadata. Some of this metadata is an author, and a message the author includes to explain the changes since the last commit. Another extremely important bit of metadata is the commit's parent commits. Most commits have a single parent, which is the previous state of the file tree. Sometimes, such as in a merge, a commit has multiple parents, since it inherits from and reconciles two or more different states. A fancy way of saying this is that a git repo forms a Directed Acyclic Graph (or DAG), with the commits as nodes, and the parent relationship being an arrow.
Before we get started, let's make some aliases that will make visualization easier. Enter the following commands:
$ git config --global alias.lol "log --graph --decorate --pretty=oneline --abbrev-commit"
$ git config --global alias.lola "log --graph --decorate --pretty=oneline --abbrev-commit --all"
Also, make sure you have the latest version of git (as of this writing, 2.5.0). On OS X, I suggest installing git via Homebrew.
Our first commit🔗
Let's make this concrete. Make a new directory called git-dag-tutorial
,
enter it, and then initialize the git repository:
$ mkdir git-dag-tutorial
$ cd git-dag-tutorial
$ git init
$ git status
On branch master
Initial commit
nothing to commit (create/copy files and use "git add" to track)
You now have an empty repository; it doesn't even have a commit yet. Let's make our first commit.
$ echo 'A0' > a.txt # Make a file just containing 'A0'
$ git add a.txt
$ git commit -m 'A0'
[master (root-commit) c32f318] A0
1 file changed, 1 insertion(+)
create mode 100644 a.txt
$ git lol
* c32f318 (HEAD -> master) A0
$ A0=`git rev-parse HEAD`
The repo now has as a single commit, and this commit is the only one that does not have a parent. This is an important property of a repo -- there is only one 'root' commit, so if you trace back the ancestry of any commit, you will eventually end up here.
The command git lol
will show us the current commit (called HEAD
), and its
parent(s) recursively until we reach the root commit. Try it now. We only
have one commit, so it will just show that. The first column is of the form
c32f318...
; the crazy hex number is called the commit hash. It will
be different for everybody, so in the last line we just saved it in the
environment variable A0
. Also, try git show HEAD
; notice that it shows you
that same commit, with some more information on it.
We are also on a branch. Type git branch
, and notice it outputs * master
.
This means we are on the master branch. A branch is really just a pointer to
a given commit that keeps up as we create new commits. Try git show master
;
notice that it gives the same commit. Actually, we were being inaccurate above;
in reality HEAD doesn't point to our first commit A0, it points to master
which
then points to A0. This is shown in the output of git lol
; a single commit,
with a hash that we'll represent by A0, that is pointed to by branch master
.
HEAD
in turns points to master
.
It's important to understand the difference between when HEAD
pointing to
master
which points to A0, and when both HEAD
and master
point to A0
separately. We can get to the latter state by:
$ git checkout $A0 # This will use the last commit above.
$ git lol
* c32f318 (HEAD, master) A0
git checkout
roughly means "move HEAD
to the given place, and change the
filesystem to match HEAD
."
Notice that when you did the checkout, you got a scary message about being in
'detached HEAD' state. Many people are afraid of this cryptic message, but you now
know what it means. We can easily escape it by typing git checkout master
,
which puts us back in our earlier, safer, state. Do that now:
$ git checkout master
Branching🔗
Let's add to our file.
$ echo 'A1' >> a.txt # Add the line 'A1'; make sure you have double brackets!
$ git commit a.txt -m 'A1'
[master 0c5a853] A1
1 file changed, 1 insertion(+)
$ git lol
* 0c5a853 (HEAD -> master) A1
* c32f318 A0
The order means that A0 is the parent of A1.
Notice that master
has moved up to the new commit A1, dragging HEAD
up with
it. Had we been in the 'detached HEAD' mode above when we made our commit,
our graph would look like:
$ git lol
* 0c5a853 (HEAD) A1
* c32f318 (master) A0
In this case, if we committed in the detached HEAD
state, we would have
left branch master
behind. Being on a branch means the branch moves with us.
Now let's experiment with branching!
$ git branch dev
$ git branch
dev
* master
$ git lol
* 0c5a853 (HEAD -> master, dev) A1
* c32f318 A0
The first command makes a branch, pointed at our current commit, and the second
command shows the branches, putting an *
before the branch HEAD
is pointing
to. Notice that in the log, we see that both master
and dev
point to A1,
and HEAD
points to master
.
Now let's switch branches:
$ git checkout dev
$ git branch
* dev
master
$ git lol
* 0c5a853 (HEAD -> dev, master) A1
* c32f318 A0
HEAD
is now pointing to branch dev
.
Now let's make another commit:
$ echo 'B2' > b.txt
$ git add b.txt
$ git commit -m 'B2'
$ git lol
* 4e7b4c1 (HEAD -> dev) B2
* 0c5a853 (master) A1
* c32f318 A0
Checking git lol
shows three commits; the most recent we'll call B2.
Notice master
is still pointed at A1, while dev
(and HEAD
) is pointed
at B2. Let's go back to master to see where we are:
$ git checkout master
$ ls
$ git lol
* 0c5a853 (HEAD -> master) A1
* c32f318 A0
Notice that there is only the file a.txt
, and the log only shows A1 and A0,
which is what we expect. When we are
pointing at a commit, we can only get information about its ancestors, not
its descendants. This is an important enough point that we'll put it in
bold: A commit knows everything about its ancestors, and nothing about
its descendants 2.
We can see the graph with all the branches by
$ git lola
* 4e7b4c1 (dev) B2
* 0c5a853 (HEAD -> master) A1
* c32f318 A0
Let's make an additional commit on master
.
$ echo 'A2' > a.txt
$ git commit a.txt -m 'A2'
$ git lol
* 93f50b6 (HEAD -> master) A2
* 0c5a853 A1
* c32f318 A0
We can see how master
and dev
relate:
$ git lola
* 93f50b6 (HEAD -> master) A2
| * 4e7b4c1 (dev) B2
|/
* 0c5a853 A1
* c32f318 A0
Our branches have diverged! The parent of both A2 and B2 is A1, but A2 and B2 don't know anything about each other.
Merging🔗
How does merging fit into this picture? Merging two branches makes a commit that has the tip of each branch as parents.
$ git merge dev
$ git lol
* eb328ca (HEAD -> master) A3
|\
| * 4e7b4c1 (dev) B2
* | 93f50b6 A2
|/
* 0c5a853 A1
* c32f318 A0
Notice that dev
is still pointing to B2, but master
now points to a commit
A3 that has both A2 and B2 as parents. Now that dev
has been merged into
master
, master
knows all about dev
and its history.
What's up with dev
?
$ git checkout dev
$ git lol
* 4e7b4c1 (HEAD -> dev) B2
* 0c5a853 A1
* c32f318 A0
$ git merge master
$ git lol
* eb328ca (HEAD -> dev, master) A3
|\
| * 4e7b4c1 B2
* | 93f50b6 A2
|/
* 0c5a853 A1
* c32f318 A0
Notice that this merge didn't make a new commit, it just moved dev
to point
at A3; this is called a 'fast-forward' merge. This is another very important
point: By default, if a merge can be accomplished by moving a branch to an
existing commit that satisfies the merge, it will do this.
You can also add the --no-ff
flag to a merge, which forces it to not
fast-forward. Had we done that, the commits would be
* d85fcb6 (HEAD -> dev) B3
|\
| * eb328ca (master) A3
| |\
| |/
|/|
* | 4e7b4c1 B2
| * 93f50b6 A2
|/
* 0c5a853 A1
* c32f318 A0
This makes the commit tree more complex, but keeps an explicit record of the merge.
Conclusion🔗
Now you understand, at a commit level, what's happening when you branch and
merge. It might be fruitful to experiment with committing, branching,
merging, detaching HEAD
, git reset
, and more, checking with git lol
and
git lola
to see the commit tree.
Footnotes🔗
Commits are made up of blobs and trees, which are more fundamental, but we'll not go down to that level.
In fact, a commit that has nothing pointing to it is considered garbage, and will be cleaned up and deleted by git in about 30 days.