Part 2 - Branching and Integrating Changes
Part 2 - Branching and Integrating Changes κ΄λ ¨
Chapter 6 - Diffs and Patches
In Part 1 you learned how Git works under the hood, the different Git objects, and how to create a repo from scratch.
When teams work with Git, they introduce sequences of changes, usually in branches, and then they need to combine different change histories together. To really understand how this is achieved, you should learn how Git treats diffs and patches. You will then apply your knowledge to understand the process of merge and rebase.
Many of the interesting processes in Git like merging, rebasing, or even committing are based on diffs and patches. Developers work with diffs all the time, whether using Git directly or relying on the IDE's diff view. In this chapter, you will learn what Git diffs and patches are, their structure, and how to apply patches.
As a reminder from the chapter on Git Objects, a commit is a snapshot of the working tree at a certain point in time, in addition to some meta-data.
Yet, it is really hard to make sense of individual commits by looking at the entire working tree. Rather, it is more helpful to look at how different a commit is from its parent commit, that is, the diff between these commits.
So, what do I mean when I say "diff"? Let's start with some history.
Git Diff's History
Git's diff
is based on the diff utility on UNIX systems. diff
was developed in the early 1970's on the Unix operating system. The first released version shipped with the Fifth Edition of Unix in 1974.
git diff
is a command that takes two inputs, and computes the difference between them. Inputs can be commits, but also files, and even files that have never been introduced to the repository.
This is important - git diff
computes the difference between two strings, which most of the time happen to consist of code, but not necessarily.
Time to Get Hands-On
As always, you are encouraged to run the commands yourself while reading this chapter. Unless noted otherwise, I will use the following repository:
You can clone it locally and have the same starting point I am using for this chapter.
Consider this short text file on my machine, called file.txt
, which consists of 6 lines:
Now, modify this file a bit. Remove the second line, and insert a new line as the fourth line. Add an exclamation mark (!
) to the end of the last line, so you get this result:
Save this file with a new name, ``new_file.txt`.
Now you can run git diff
to compute the difference between the files like so:
git diff --no-index file.txt new_file.txt
I will explain the
--no-index
switch of this command later. For now it's enough to understand it allows us to compare between two files that are not part of a Git repository.
The output of git diff
shows quite a lot of things.
Focus on the part starting with This is a file
. You can see that the added line (// new test
) is preceded by a +
sign. The deleted line is preceded by a -
sign.
Interestingly, notice that Git views a modified line as a sequence of two changes - erasing a line and adding a new line instead. So the patch includes deleting the last line, and adding a new line that's equal to that line, with the addition of a !
.
Now would be a good time to discuss the terms "patch" and "diff". These two are often used interchangeably, although there is a distinction, at least historically.
A diff shows the differences between two files, or snapshots, and can be quite minimal in doing so. A patch is an extension of a diff, augmented with further information such as context lines and filenames, which allow it to be applied more widely. It is a text document that describes how to alter an existing file or codebase.
These days, the Unix diff
program, and git diff
, can produce patches of various kinds.
A patch is a compact representation of the differences between two files. It describes how to turn one file into another.
In other words, if you apply the "instructions" produced by git diff
on file.txt
- that is, remove the second line, insert // new test
as the fourth line, remove the last line, and add instead a line with the same content and !
- you will get the content of new_file.txt
.
Another important thing to note is that a patch is asymmetric: the patch from file.txt
to new_file.txt
is not the same as the patch for the other direction. Generating a patch between new_file.txt
and file.txt
, in this order, would mean exactly the opposite instructions than before - add the second line instead of removing it, and so on.
Try it out:
git diff --no-index new_file.txt file.txt
The patch format uses context, as well as line numbers, to locate differing file regions. This allows a patch to be applied to a somewhat earlier or later version of the first file than the one from which it was derived, as long as the applying program can still locate the context of the change. We will see exactly how these are used.
The Structure of a Diff
It's time to dive deeper.
Generate a diff from file.txt
to new_file.txt
again, and consider the output more carefully:
git diff --no-index file.txt new_file.txt
The first line introduces the compared files. Git always gives one file the name a
, and the other the name b
. So in this case file.txt
is called a
, whereas new_file.txt
is called b
.
Then the second line, starting with index
, includes the blob SHAs of these files. So even though in our case they are not even stored within a Git repo, Git shows their corresponding SHA-1 values.
The third value in this line, 100644
, is the "mode bits", indicating that this is a "regular" file: not executable and not a symbolic link.
The use of two dots (..
) here between the blob SHAs is just as a separator (unlike other cases where it's used within Git).
Other header lines might indicate the old and new mode bits if they've changed, old and new filenames if the files were being renamed, and so on.
The blob SHAs (also called "blob IDs") are helpful if this patch is later applied by Git to the same project and there are conflicts while applying it. You will better understand what this means when you learn about the merges in the next chapter.
After the blob IDs, we have two lines: one starting with -
signs, and the other starting with +
signs. This is the traditional "unified diff" header, again showing the files being compared and the direction of the changes: -
signs show lines in the A version that are missing from the B version, and +
signs show lines missing in the A version but present in B.
If the patch were of this file being added or deleted in its entirety, then one of these would be /dev/null
to signal that.
Consider the case where you delete a file:
rm awesome.txt
And then use git diff
:
The A
version, representing the state of the index, is currently awesome.txt
, compared to the working dir where this file does not exist, so it is /dev/null
. All lines are preceded by -
signs as they exist only in the A
version.
For now, undo the deleting (more on undoing changes in Part 3):
git restore awesome.txt
Going back to the diff we started with:
After this unified diff header, we get to the main part of the diff, consisting of "difference sections", also called "hunks" or "chunks" in Git. Note that these terms are used interchangeably, and you may stumble upon either of them in Git's documentation and tutorials, as well as Git's source code.
Every hunk begins with a single line, starting with two @
signs. These signs are followed by at most four numbers, and then a header for the chunk - which is an educated guess by Git. Usually, it will include the beginning of a function or a class, when possible.
In this example it doesn't include anything as this is a text file, so consider another example for a moment:
git diff --no-index example.py example_changed.py
In the image above, the hunk's header includes the beginning of the function that includes the changed lines - def example_function(x)
.
Back to our previous example then:
After the two @
signs, you'll find four numbers:
The first numbers are preceded by a -
sign as they refer to file A
. The first number represents the line number corresponding to the first line in file A
that this hunk refers to. In the example above, it is 1
, meaning that the line This is a file
corresponds to line number 1
in version file A
.
This number is followed by a comma (,
), and then the number of lines this chunk consists of in file A
. This number includes all context lines (the lines preceded with a space in the diff
), or lines marked with a -
sign, as they are part of file A
, but not lines marked with a +
sign, as they do not exist in file A
.
In our example, this number is 6
, counting the context line This is a file
, the -
line It has a nice poem:
, then the three context lines, and lastly Are belong to you
.
As you can see, the lines beginning with a space character are context lines, which means they appear as shown in both file A
and file B
.
Then, we have a +
sign to mark the two numbers that refer to file B
. First, there's the line number corresponding to the first line in file B
, followed by the number of lines this chunk consists of in file B
.
This number includes all context lines, as well as lines marked with the +
sign, as they are part of file B
, but not lines marked with a -
sign.
These four numbers are followed by two additional @
signs.
After the header of the chunk, we get the actual lines - either context, -
, or +
lines.
Typically and by default, a hunk starts and ends with three context lines. For example, if you modify lines 4-5 in a file with ten lines:
- Line 1 - context line (before the changed lines)
- Line 2 - context line (before the changed lines)
- Line 3 - context line (before the changed lines)
- Line 4 - changed line
- Line 5 - another changed line
- Line 6 - context line (after the changed lines)
- Line 7 - context line (after the changed lines)
- Line 8 - context line (after the changed lines)
- Line 9 - this line will not be part of the hunk
So by default, changing lines 4-5 results in a hunk consisting of lines 1-8, that is, three lines before and three lines after the modified lines.
If that file doesn't have nine lines, but rather six lines - then the hunk will contain only one context line after the changed lines, and not three. Similarly, if you change the second line of a file, then there would be only one line of context before the changed lines.
How to Produce Diffs
The last example we considered shows a diff between two files. A single patch file can contain the differences for any number of files, and git diff
produces diffs for all altered files in the repository in a single patch.
Often, you will see the output of git diff
showing two versions of the same file and the difference between them.
To demonstrate, consider the state in another branch called diffs
:
git checkout diffs
Again, I encourage you to run the commands with me - make sure you clone the repository from:
At the current state, the active directory is a Git repository, with a clean status:
Take an existing file, my_file.py
:
And change the second line from print('An example function!')
to print('An example function! And it has been changed!')
:
Save your changes, but don't stage or commit them. Next, run git diff
:
The output of git diff
shows the difference between my_file.py
's version in the staging area, which in this case is the same as the last commit (HEAD
), and the version in the working directory.
I covered the terms "working directory", "staging area", and "commit" in the Git objects chapter, so check it out in ccase you would like to refresh your memory. As a reminder, the terms "staging area" and "index" are interchangeable, and both are widely used.
To see the difference between the working dir and the staging area, use git diff
, without any additional flags.
As you can see, git diff
lists here both file A
and file B
pointing to my_file.py
. file A
here refers to the version of my_file.py
in the staging area, whereas file B
refers to its version in the working dir.
Note that if you modify my_file.py
in a text editor, and don't save the file, then git diff
will not be aware of the changes you've made. This is because they haven't been saved to the working dir.
We can provide a few switches to git diff
to get the diff between the working dir and a specific commit, or between the staging area and the latest commit, or between two commits, and so on.
First create a new file, new_file.txt
, and save it:
Currently the file is in the working dir, and it is actually untracked in Git.
Now stage and commit this file:
git add new_file.txt
git commit -m "Commit 3"
Now, the state of HEAD
is the same as the state of the staging area, as well as the working tree:
Next, edit new_file.txt
by adding a new line at the beginning and another new line at the end:
As a result, the state is as follows:
A nice trick would be to use git add -p
, which allows you to split the changes even within a file, and consider which ones you'd like to stage.
In this case, add the first line to the index, but not the last line. To do that, you can split the hunk using s
, then accept to stage the first hunk (using y
), and not the second part (using n
).
If you are not sure what each letter stands for, you can always use a ?
and Git will tell you.
So now the state in HEAD
is without either of those new lines. In the staging area you have the first line but not the last line, and in the working dir you have both new lines.
If you use git diff
, what will happen?
Well, as stated before, you get the diff between the staging area and the working tree.
What happens if you want to get the diff between HEAD
and the staging area? For that, you can use git diff --cached
:
And what if you want the difference between HEAD
and the working tree? For that you can run git diff HEAD
:
To summarize the different switches for git diff we have seen so far, here's a diagram:
As a reminder, at the beginning of this chapter you used git diff --no-index
. With the --no-index
switch, you can compare two files that are not part of the repository - or of any staging area.
Now, commit the changes you have in the staging area:
git commit -m "Commit 4"
To observe the diff between this commit and its parent commit, you can run the following command:
git diff HEAD~1 HEAD
By the way, you can omit the 1
above and write HEAD~
, and get the same result. Using 1
is the explicit way to state you are referring to the first parent of the commit.
Note that writing the parent commit here, HEAD~1
, first results in a diff showing how to get from the parent commit to the current commit. Of course, I could also generate the reverse diff by writing:
git diff HEAD HEAD~1
To summarize all the different switches for git diff we covered in this section, see this diagram:
A short way to view the diff between a commit and its parent is by using git show
, for example:
git show HEAD
This is the same as writing:
git diff HEAD~ HEAD
We can now update our diagram:
You can go back to this diagram as a reference when needed.
As a reminder, Git commits are snapshots - of the entire working directory of the repository, at a certain point in time. Yet, it's sometimes not useful to regard a commit as a whole snapshot, but rather by the changes this specific commit introduced. In other words, by the diff between a parent commit to the next commit.
As you learned in the Git Objects chapter, Git stores the entire snapshots. The diff is dynamically generated from the snapshot data - by comparing the root trees of the commit and its parent.
Of course, Git can compare any two snapshots in time, not just adjacent commits, and also generate a diff of files not included in a repository.
How to Apply Patches
By using git diff
you can see a patch Git generates, and you can then apply this patch using git apply
.
Historical Note
Actually, sharing patches used to be the main way to share code in the early days of open source. But now - virtually all projects have moved to sharing Git commits directly through pull requests (called "merge requests" on some platforms).
The biggest problem with using patches is that it is hard to apply a patch when your working directory does not match the sender's previous commit. Losing the commit history makes it difficult to resolve conflicts. You will better understand this as you dive deeper into the process of git apply
, especially in the next chapter where we cover merges.
A Simple Patch
What does it mean to apply a patch? It's time to try it out!
Take the output of git diff
:
git diff HEAD~1 HEAD
And store it in a file:
git diff HEAD~1 HEAD > my_patch.patch
Use reset
to undo the last commit:
git reset --hard HEAD~1
Don't worry about the last command - I'll explain it in detail in Part 3, where we discuss undoing changes. In short, it allows us to "reset" the state of where HEAD
is pointing to, as well as the state of the index and of the working dir. In the example above, they are all set to the state of HEAD~1
, or "Commit 3" in the diagram.
So after running the reset command, the contents of the file are as follows (the state from "Commit 3"):
nano new_file.txt
![new_file.txt
]https://freecodecamp.org/news/content/images/2023/12/nano_new_file-1.png)
And you will apply this patch that you've just saved:
nano my_patch.patch
This patch tells Git to find the lines:
This is a new file
With new content!
Those lines used to be line number 1 and line number 2 in new_file.txt
, and add a line with the content START!
right above them.
Run this command to apply the patch:
git apply my_patch.patch
And as a result, you get this version of your file, just like the commit you have created before:
nano new_file.txt
Understanding the Context Lines
To understand the importance of context lines, consider a more advanced scenario. What happens if line numbers have changed since you created the patch file?
To test, start by creating another file:
nano test.text
Stage and commit this file:
git add test.txt
git commit -m "Test file"
Now, change this file by adding a new line, and also erasing the line before the last one:
Observe the difference between the original version of the file and the version including your changes:
git diff -- test.txt
(Using -- test.txt
tells Git to run the command diff
, taking into consideration only test.txt
, so you don't get the diff for other files.)
Store this diff into a patch file:
git diff -- test.txt > new_patch.patch
Now, reset your state to that before introducing the changes:
git reset --hard
If you were to apply new_patch.patch now, it would simply work.
Let's now consider a more interesting case. Modify test.txt
again by adding a new line at the beginning:
As a result, the line numbers are different from the original version where the patch has been created. Consider the patch you created before:
It assumes that the line With more text
is the second line in test.txt
, which is no longer the case. So...will git apply
work?
git apply new_patch.patch
It worked!
By default, Git looks for 3 lines of context before and after each change introduced in the patch - as you can see, they are included in the patch file. If you take three lines before and after the added line, and three lines before and after the deleted line (actually only one line after, as no other lines exist) - you get to the patch file. If these lines all exist - then applying the patch works, even if the line numbers changed.
Reset the state again:
git reset --hard
What happens if you change one of the context lines? Try it out by changing the line With more text
to With more text!
:
And now:
git apply new_patch.patch
Well, no. The patch does not apply. If you are not sure why, or just want to better understand the process Git is performing, you can add the --verbose
flag to git apply
, like so:
git apply --verbose new_patch.patch
It seems that Git searched lines from the file, including the line "With more text", right before the line "It has some really nice lines". This sequence of lines no longer exists in the file. As Git cannot find this sequence, it cannot apply the patch.
As mentioned earlier, by default, Git looks for 3 lines of context before and after each change introduced in the patch. If the surrounding three lines do not exist, Git cannot apply the patch.
You can ask Git to rely on fewer lines of context, using the -C
argument. For example, to ask Git to look for 1 line of the surrounding context, run the following command:
git apply -C1 new_patch.patch
The patch applies!
Why is that? Consider the patch again:
When applying the patch with the -C1
option, Git is looking for the lines:
Like this one
And that one
in order to add the line !!!This is the new line!!!
between these two lines. These lines exist (and, importantly, they appear one right after the other). As a result, Git can successfully add the line between them, even though the line numbers changed.
Similarly, Git would look for the lines:
How wonderful
So we are writing an example
Git is awesoome!
As Git can find these lines, Git can erase the middle one.
If we changed one of these lines, say, changed "How wonderful" to "How very wondeful", then Git would not be able to find the string above, and thus the patch would not apply.
Recap - Git Diff and Patch
In this chapter, you learned what a diff is, and the difference between a diff and a patch. You learned how to generate various patches using different switches for git diff
. You also learned what the output of git diff looks like, and how it is constructed. Ultimately, you learned how patches are applied, and specifically the importance of context.
Understanding diffs is a major milestone for understanding many other processes within Git - for example, merging or rebasing, that we will explore in the next chapters.
Chapter 7 - Understanding Git Merge
By reading this chapter, you are going to really understand git merge
, one of the most common operations you'll perform in your Git repositories.
What is a Merge in Git?
Merging is the process of combining the recent changes from several branches into a single new commit. This commit points back to these branches.
In a way, merging is the complement of branching in version control: a branch allows you to work simultaneously with others on a particular set of files, whereas a merge allows you to later combine separate work on branches that diverged from a common ancestor commit.
OK, let's take this bit by bit.
Remember that in Git, a branch is just a name pointing to a single commit. When we think about commits as being "on" a specific branch, they are actually reachable through the parent chain from the commit that the branch is pointing to.
That is, if you consider this commit graph:
You see the branch feature_1
, which points to a commit with the SHA-1 value of ba0d2
. As in previous chapters, I only write the first 5 digits of the SHA-1 value for brevity.
Notice that commit 54a9d
is also "on" this branch, as it is the parent commit of ba0d2
. So if you start from the pointer of feature_1
, you get to ba0d2
, which then points to 54a9d
. You can go on the chain of parents, and all these reachable commits are considered to be "on" feature_1
.
When you merge with Git, you merge commits. Almost always, we merge two commits by referring to them with the branch names that point to them. Thus we say we "merge branches" - though under the hood, we actually merge commits.
Time to Get Hands-on
For this chapter, I will use the following repository:
As in previous chapters, I encourage you to clone it locally and have the same starting point I am using for this chapter.
OK, so let's say I have this simple repository here, with a branch called main
, and a few commits with the commit messages of "Commit 1", "Commit 2", and "Commit 3":
Next, create a feature branch by typing git branch new_feature
:
And switch HEAD
to point to this new branch, by using git checkout new_feature
(or git switch new_feature
). You can look at the outcome by using git log:
As a reminder, you could also write git checkout -b new_feature
, which would both create a new branch and change HEAD
to point to this new branch.
If you need a reminder about branches and how they're implemented under the hood, please check out chapter 2. Yes, check out. Pun intended π
Now, on the new_feature
branch, implement a new feature. In this example, I will edit an existing file that looks like this before the edit:
And I will now edit it to include a new function:
And luckily, this is not a programming book, so this function is legit π
Next, stage and commit this change:
git add code.py
git commit -m "Commit 4"
Looking at the history, you have the branch new_feature
, now pointing to "Commit 4", which points to its parent, "Commit 3". The branch main is also pointing to "Commit 3".
Time to merge the new feature! That is, merge these two branches, main
and new_feature
. Or, in Git's lingo, merge new_feature
into main
. This means merging "Commit 4" and "Commit 3". This is pretty trivial, as after all, "Commit 3" is an ancestor of "Commit 4".
Check out the main branch (with git checkout main
), and perform the merge by using git merge new_feature
:
Since new_feature
never really diverged from main, Git could just perform a fast-forward merge. So what happened here? Consider the history:
Even though you used git merge
, there was no actual merging here. Actually, Git did something very simple - it reset
the main branch to point to the same commit as the branch new_feature
.
In case you don't want that to happen, but rather you want Git to really perform a merge, you could either change Git's configuration, or run the merge command with the --no-ff
flag.
First, undo the last commit:
git reset --hard HEAD~1
Reminder: if this way of using reset is not clear to you, don't worry - we will cover it in detail in Part 3. It is not crucial for this introduction of merge, though. For now, it's important to understand that it basically undoes the merge operation.
Just to clarify, now if you checked out new_feature
again:
git checkout new_feature
The history would look just like before the merge:
Next, perform the merge with the --no-fast-forward
flag (--no-ff
for short):
git checkout main
git merge new_feature --no-ff
Now, if we look at the history using git lol
:
(Reminder: git lol
is an alias I added to Git to visibly see the history in a graphical manner. You can find it, along with the other components of my setup, at the My Setup part of the Introduction chapter.)
Considering this history, you can see Git created a new commit, a merge commit.
If you consider this commit a bit closer:
git log -n1
You will see that this commit actually has two parents - "Commit 4", which was the commit that new_feature
pointed to when you ran git merge
, and "Commit 3", which was the commit that main
pointed to.
A merge commit has two parents: the two commits it merged.
The merge commit shows us the concept of merge quite well. Git takes two commits, usually referenced by two different branches, and merges them together.
After the merge, as you started the process from main
, you are still on main
, and the history from new_feature
has been merged into this branch. Since you started with main
, then "Commit 3", which main
pointed to, is the first parent of the merge commit, whereas "Commit 4", which you merged into main
, is the second parent of the merge commit.
Notice that you started on main
when it pointed to "Commit 3", and Git went quite a long way for you. It changed the working tree, the index, and also HEAD
and created a new commit object. At least when you use git merge
without the --no-commit
flag and when it's not a fast-forward merge, Git does all of that.
This was a super simple case, where the branches you merged didn't diverge at all. We will soon consider more interesting cases.
By the way, you can use git merge
to merge more than two commits - actually, any number of commits. This is rarely done, and to adhere to the practicality principle of this book, I won't delve into it.
Another way to think of git merge
is by joining two or more development histories together. That is, when you merge, you incorporate changes from the named commits, since the time their histories diverged from the current branch, into the current branch. I used the term "branch" here, but I am stressing this again - we are actually merging commits.
Time For a More Advanced Case
Time to consider a more advanced case, which is probably the most common case where we use git merge
explicitly - where you need to merge branches that did diverge from one another.
Assume we have two people working on this repo now, John and Paul.
John created a branch:
git checkout -b john_branch
And John has written a new song in a new file, lucy_in_the_sky_with_diamonds.md
. Well, I believe John Lennon didn't really write in Markdown format, or use Git for that matter, but let's pretend he did for this explanation.
git add lucy_in_the_sky_with_diamonds.md
git commit -m "Commit 5"
While John was working on this song, Paul was also writing, on another branch. Paul had started from main:
git checkout main
And created his own branch:
git checkout -b paul_branch
And Paul wrote his song into a file called penny_lane.md
. Paul staged and committed this file:
git add penny_lane.md
git commit -m "Commit 6"
So now our history looks like this - where we have two different branches, branching out from main
, with different histories:
John is happy with his branch (that is, his song), so he decides to merge it into the main
branch:
git checkout main
git merge john_branch
Actually, this is a fast-forward merge, as we have learned before. You can validate that by looking at the history (using git lol
, for example):
At this point, Paul also wants to merge his branch into main
, but now a fast-forward merge is no longer relevant - there are two different histories here: the history of main
's and that of paul_branch
's. It's not that paul_branch
only adds commits on top of main branch or vice versa.
Now things get interesting. ππ
First, let Git do the hard work for you. After that, we will understand what's actually happening under the hood.
git merge paul_branch
Consider the history now:
What you have is a new commit, with two parents - "Commit 5" and "Commit 6".
In the working dir, you can see that both John's song as well as Paul's song are there (if you use ls
, you will see both files in the working dir).
Nice, Git really did merge the changes for you. But how does that happen?
Undo this last commit:
git reset --hard HEAD~
How to Perform a Three-way Merge in Git
It's time to understand what's really happening under the hood. π
What Git has done here is it called a 3-way merge. In outlining the process of a 3-way merge, I will use the term "branch" for simplicity, but you should remember you could also merge two (or more) commits that are not referenced by a branch.
The 3-way merge process includes these stages:
First, Git locates the common ancestor of the two branches. That is, the common commit from which the merging branches most recently diverged. Technically, this is actually the first commit that is reachable from both branches. This commit is then called the merge base.
Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch. Git generates patches based on those diffs.
Third, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new merge commit.
![](https://freecodecamp.org/news/content/images/2023/12/3_way_merge.png" alt="The three steps of the 3-way merge algorithm: (1) locate the common ancestor; (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch; (3) apply both patches together" width="828" height="522" loading="lazy"> The three steps of the 3-way merge algorithm: (1) locate the common ancestor (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch (3) apply both patches together
So, back to our example.
In the first step, Git looks from both branches - main
and paul_branch
- and traverses the history to find the first commit that is reachable from both. In this case, this would be⦠which commit?
Correct, the merge commit (the one with "Commit 3" and "Commit 4" as its parents).
If you are not sure, you can always ask Git directly:
git merge-base main paul_branch
![](https://freecodecamp.org/news/content/images/2023/12/3_way_merge_base.png" alt="The merge base is the merge commit with 'Commit 3' and 'Commit 4' as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the command" width="1424" height="515" loading="lazy"> The merge base is the merge commit with "Commit 3" and "Commit 4" as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the reset
command
By the way, this is the most common and simple case, where we have a single obvious choice for the merge base. In more complicated cases, there may be multiple possibilities for a merge base, but this is not within our focus.
In the second step, Git calculates the diffs. So it first calculates the diff between the merge commit and "Commit 5":
git diff 4f90a62 4683aef
(The SHA-1 values will be different on your machine.)
![](https://freecodecamp.org/news/content/images/2023/12/diff_4_5.png" alt="The diff between the merge commit and 'Commit 5'\label{fig-john-patch}" width="707" height="331" loading="lazy"> The diff between the merge commit and "Commit 5"
If you don't feel comfortable with the output of git diff
, you can read the previous chapter where I described it in detail.
You can store that diff to a file:
git diff 4f90a62 4683aef > john_branch_diff.patch
Next, Git calculates the diff between the merge commit and "Commit 6":
git diff 4f90a62 c5e4951
![](https://freecodecamp.org/news/content/images/2023/12/diff_4_6.png" alt="The diff between the merge commit and 'Commit 6'" width="516" height="307" loading="lazy"> The diff between the merge commit and "Commit 6"
Write this one to a file as well:
git diff 4f90a62 c5e4951 > paul_branch_diff.patch
Now Git applies those patches on the merge base.
First, try that out directly - just apply the patches (I will walk you through it in a moment). This is not what Git really does under the hood, but it will help you gain a better understanding of why Git needs to do something different.
Checkout the merge base first, that is, the merge commit:
git checkout 4f90a62
And apply John's patch first (as a reminder, this is the patch shown in the image with the caption "The diff between the merge commit and "Commit 5""):
git apply --index john_branch_diff.patch
Notice that for now there is no merge commit. git apply
updates the working dir as well as the index, as we used the --index
switch.
You can observe the status using git status
:
So now John's new song is incorporated into the index. Apply the other patch:
git apply --index paul_branch_diff.patch
As a result, the index contains changes from both branches.
Now it's time to commit your merge. Since the porcelain command git commit
always generates a commit with a single parent, you would need the underlying plumbing command - git commit-tree
.
If you need a reminder about porcelain vs plumbing commands, check out chapter 4 where I explained these terms, and created an entire repo from scratch.
Remember that every Git commit object points to a single tree. So you need to record the contents of the index in a tree:
git write-tree
Now you get the SHA-1 value of the created tree, and you can create a commit object using git commit-tree
:
git commit-tree <TREE_SHA> -p <COMMIT_<span class="token file-descriptor important">5> -p <COMMIT_<span class="token file-descriptor important">6> -m "Merge commit!"
Great, so you have created a commit object!
Recall that git merge
also changes HEAD
to point to the new merge commit object. So you can simply do the same:
git reset --hard db315a
If you look at the history now:
Note
in this state, HEAD
is "detached" - that is, it directly points to a commit object rather than a named reference. gg
does not show HEAD
when it is "detached", so don't be confused if you can't see HEAD
in the output of gg
.
This is almost what we wanted. Remember that when you ran git merge
, the result was HEAD
pointing to main
which pointed to the newly created commit (as shown in the image with the caption "When you merge paul_branch
, you get a new merge commit". What should you do then?
Well, what you want is to modify main
, so you can just point it to the new commit:
git checkout main
git reset --hard db315a
And now you have the same result as when you ran git merge
: main
points to the new commit, which has "Commit 5" and "Commit 6" as its parents. You can use git lol
to verify that.
So this is exactly the same result as the merge done by Git, with the exception of the timestamp and thus the SHA-1 value, of course.
Overall, you got to merge both the contents of the two commits - that is, the state of the files, and also the history of those commits - by creating a merge commit that points to both histories.
In this simple case, you could actually just apply the patches using git apply
, and everything works quite well.
Quick Recap of a Three-way Merge
So to quickly recap, on a three-way merge, Git:
- First, locates the merge base - the common ancestor of the two branches. That is, the first commit that is reachable from both branches.
- Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch.
- Third, Git applies both patches to the merge base, using a 3-way merge algorithm. I haven't explained the 3-way merge yet, but I will elaborate on that later. The result is the state of the new merge commit.
You can also understand why it's called a "3-way merge": Git merges three different states - that of the first branch, that of the second branch, and their common ancestor. In our previous example, main
, paul_branch
, and the merge commit (with "Commit 3" and "Commit 4" as parents), respectively.
This is unlike, say, the fast-forward examples we saw before. The fast-forward examples are actually a case of a two-way merge, as Git only compares two states - for example, where main
pointed to, and where john_branch
pointed to.
Moving on
Still, this was a simple case of a 3-way merge. John and Paul created different songs, so each of them touched a different file. It was pretty straightforward to execute the merge.
What about more interesting cases?
Let's assume that now John and Paul are co-authoring a new song.
So, John checked out main
branch and started writing the song:
git checkout main