Time For a More Advanced Case
Time For a More Advanced Case κ΄λ ¨
Time to consider a more advanced case, which is probably the most common case where we use git merge
explicitly β where you need to merge branches that did diverge from one another.
Assume we have two people working on this repo now, John and Paul.
John created a branch:
git checkout -b john_branch
And John has written a new song in a new file, lucy_in_the_sky_with_diamonds.md
. Well, I believe John Lennon didn't really write in Markdown format, or use Git for that matter, but let's pretend he did for this explanation.
git add lucy_in_the_sky_with_diamonds.md
git commit -m "Commit 5"
While John was working on this song, Paul was also writing, on another branch. Paul had started from main
:
git checkout main
And created his own branch:
git checkout -b paul_branch
And Paul wrote his song into a file:
nano penny_lane.md
And committed it:
git add penny_lane.md
git commit -m "Commit 6"
So now our history looks like this β where we have two different branches, branching out from main
, with different histories.
John is happy with his branch (that is, his song), so he decides to merge it into the main
branch:
git checkout main
git merge john_branch
Actually, this is a fast-forward merge, as we have learned before. You can validate that by looking at the history (using git lol
, for example):
At this point, Paul also wants to merge his branch into main
, but now a fast-forward merge is no longer relevant β there are two different histories here: the history of main
's and that of paul_branch
's. It's not that paul_branch
only adds commits on top of main
branch or vice versa.
Now things get interesting. ππ
First, let Git do the hard work for you. After that, we will understand what's actually happening under the hood.
git merge paul_branch
Consider the history now:
What you have is a new commit, with two parents β "Commit 5" and "Commit 6". In the working dir, you can see that both John's song as well as Paul's song are there: ls
Nice, Git really did merge the changes for us. But how does that happen?
Undo this last commit:
git reset --hard HEAD~
How to perform a three-way merge in Git
It's time to understand what's really happening under the hood. π
What Git has done here is it called a 3-way merge
. In outlining the process of a 3-way merge, I will use the term "branch" for simplicity, but you should remember you could also merge two (or more) commits that are not referenced by a branch.
The 3-way merge process includes these stages:
First, Git locates the common ancestor of the two branches. That is, the common commit from which the merging branches most recently diverged. Technically, this is actually the first commit that is reachable from both branches. This commit is then called the merge base.
Second, Git calculates two diffs β one diff from the merge base to the first branch, and another diff from the merge base to the second branch. Git generates patches based on those diffs.
Third, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new, merge commit.
So, back to our example.
In the first step, Git looks from both branches β main
and paul_branch
β and traverses the history to find the first commit that is reachable from both. In this case, this would be...which commit?
Correct, "Commit 4".
If you are not sure, you can always ask Git directly:
git merge-base main paul_branch
By the way, this is the most common and simple case, where we have a single obvious choice for the merge base. In more complicated cases, there may be multiple possibilities for a merge base, but this is a topic for another post.
In the second step, Git calculates the diffs. So it first calculates the diff between "Commit 4" and "Commit 5":
git diff 4f90a62 4683aef
(The SHA-1 values will be different on your machine)
If you don't feel comfortable with the output of git diff
, please read the previous post where I described it in detail.
You can store that diff to a file:
git diff 4f90a62 4683aef > john_branch_diff.patch
Next, Git calculates the diff between "Commit 4" and "Commit 6":
git diff 4f90a62 c5e4951
Write this one to a file as well:
git diff 4f90a62 c5e4951 > paul_branch_diff.patch
Now Git applies those patches on the merge base.
First, try that out directly β just apply the patches (I will walk you through it in a moment). This is not what Git really does under the hood, but it will help you gain a better understanding of why Git needs to do something different.
Checkout the merge base first, that is, "Commit 4":
git checkout 4f90a62
And apply John's patch first:
git apply -βindex john_branch_diff.patch
Notice that for now there is no merge commit. git apply
updates the working dir as well as the index, as we used the --index
switch.
You can observe the status using git status
:
So now John's new song is incorporated into the index. Apply the other patch:
git apply -βindex paul_branch_diff.patch
As a result, the index contains changes from both branches.
Now it's time to commit your merge. Since the porcelain command git commit
always generates a commit with a single parent, you would need the underlying plumbing command β git commit-tree
.
If you need a reminder about porcelain vs plumbing commands, check out the post where I explained these terms, and created an entire repo from scratch (swimm
).
Remember that every Git commit object points to a single tree (swimm
). So you need to record the contents of the index in a tree:
git write-tree
Now you get the SHA-1 value of the created tree, and you can create a commit object using git commit-tree
:
git commit-tree <TREE_SHA> -p <COMMIT_4> -p <COMMIT_5> -m "Merge commit!"
Great, so you have created a commit object πͺπ»
Recall that git merge
also changes HEAD
to point to the new merge commit object. So you can simply do the same: git reset β-hard db315a
If you look at the history now:
You can see that you've reached the same result as the merge done by Git, with the exception of the timestamp and thus the SHA-1 value, of course.
So you got to merge both the contents of the two commits β that is, the state of the files, and also the history of those commits β by creating a merge commit that points to both histories.
In this simple case, you could actually just apply the patches using git apply
, and everything worked quite well.