Part 2 - Branching and Integrating Changes

About 89 min

Part 2 - Branching and Integrating Changes 관련

Gitting Things Done – A Visual and Practical Guide to Git [Full Book]

Introduction Git is awesome. Most software developers use Git on a daily basis. But how many truly understand Git? Do you feel like you know what's going on under the hood as you use Git to perform various tasks? For example, what happens when you us...

Gitting Things Done – A Visual and Practical Guide to Git [Full Book]

Chapter 6 - Diffs and Patches

In Part 1 you learned how Git works under the hood, the different Git objects, and how to create a repo from scratch.

When teams work with Git, they introduce sequences of changes, usually in branches, and then they need to combine different change histories together. To really understand how this is achieved, you should learn how Git treats diffs and patches. You will then apply your knowledge to understand the process of merge and rebase.

Many of the interesting processes in Git like merging, rebasing, or even committing are based on diffs and patches. Developers work with diffs all the time, whether using Git directly or relying on the IDE's diff view. In this chapter, you will learn what Git diffs and patches are, their structure, and how to apply patches.

As a reminder from the chapter on Git Objects, a commit is a snapshot of the working tree at a certain point in time, in addition to some meta-data.

Yet, it is really hard to make sense of individual commits by looking at the entire working tree. Rather, it is more helpful to look at how different a commit is from its parent commit, that is, the diff between these commits.

So, what do I mean when I say "diff"? Let's start with some history.

Git Diff's History

Git's diff is based on the diff utility on UNIX systems. diff was developed in the early 1970's on the Unix operating system. The first released version shipped with the Fifth Edition of Unix in 1974.

git diff is a command that takes two inputs, and computes the difference between them. Inputs can be commits, but also files, and even files that have never been introduced to the repository.

Git diff takes two inputs, which can be commits or files

This is important - git diff computes the difference between two strings, which most of the time happen to consist of code, but not necessarily.

Time to Get Hands-On

As always, you are encouraged to run the commands yourself while reading this chapter. Unless noted otherwise, I will use the following repository:

Omerr/gitting_things_repo

Contribute to Omerr/gitting_things_repo development by creating an account on GitHub.

You can clone it locally and have the same starting point I am using for this chapter.

Consider this short text file on my machine, called file.txt, which consists of 6 lines:

<FontIcon icon="fas fa-file-lines"/> consists of six lines — `file.txt` consists of six lines

Now, modify this file a bit. Remove the second line, and insert a new line as the fourth line. Add an exclamation mark (!) to the end of the last line, so you get this result:

After modifying <FontIcon icon="fas fa-file-lines"/>, we get different six lines — After modifying `file.txt`, we get different six lines

Save this file with a new name, ``new_file.txt`.

Now you can run git diff to compute the difference between the files like so:

git diff --no-index file.txt new_file.txt

I will explain the --no-index switch of this command later. For now it's enough to understand it allows us to compare between two files that are not part of a Git repository.

The output of git diff shows quite a lot of things.

Focus on the part starting with This is a file. You can see that the added line (// new test) is preceded by a + sign. The deleted line is preceded by a - sign.

Interestingly, notice that Git views a modified line as a sequence of two changes - erasing a line and adding a new line instead. So the patch includes deleting the last line, and adding a new line that's equal to that line, with the addition of a !.

Addition lines are preceded by , deletion lines by , and modification lines are sequences of deletions and additions — Addition lines are preceded by `+`, deletion lines by `-`, and modification lines are sequences of deletions and additions

Now would be a good time to discuss the terms "patch" and "diff". These two are often used interchangeably, although there is a distinction, at least historically.

A diff shows the differences between two files, or snapshots, and can be quite minimal in doing so. A patch is an extension of a diff, augmented with further information such as context lines and filenames, which allow it to be applied more widely. It is a text document that describes how to alter an existing file or codebase.

These days, the Unix diff program, and git diff, can produce patches of various kinds.

A patch is a compact representation of the differences between two files. It describes how to turn one file into another.

In other words, if you apply the "instructions" produced by git diff on file.txt - that is, remove the second line, insert // new test as the fourth line, remove the last line, and add instead a line with the same content and ! - you will get the content of new_file.txt.

Another important thing to note is that a patch is asymmetric: the patch from file.txt to new_file.txt is not the same as the patch for the other direction. Generating a patch between new_file.txt and file.txt, in this order, would mean exactly the opposite instructions than before - add the second line instead of removing it, and so on.

A patch consists of asymmetric instructions to get from one file to another

Try it out:

git diff --no-index new_file.txt file.txt

Running git diff in the reverse direction yields the reverse instructions - add a line instead of removing it, and so on

The patch format uses context, as well as line numbers, to locate differing file regions. This allows a patch to be applied to a somewhat earlier or later version of the first file than the one from which it was derived, as long as the applying program can still locate the context of the change. We will see exactly how these are used.

The Structure of a Diff

It's time to dive deeper.

Generate a diff from file.txt to new_file.txt again, and consider the output more carefully:

git diff --no-index file.txt new_file.txt

The first line introduces the compared files. Git always gives one file the name a, and the other the name b. So in this case file.txt is called a, whereas new_file.txt is called b.

The first line in 's output introduces the files being compared — The first line in `diff`'s output introduces the files being compared

Then the second line, starting with index, includes the blob SHAs of these files. So even though in our case they are not even stored within a Git repo, Git shows their corresponding SHA-1 values.

The third value in this line, 100644, is the "mode bits", indicating that this is a "regular" file: not executable and not a symbolic link.

The use of two dots (..) here between the blob SHAs is just as a separator (unlike other cases where it's used within Git).

The second line in 's output includes the blob SHAs of the compared files, as well as the mode bits — The second line in `diff`'s output includes the blob SHAs of the compared files, as well as the mode bits

Other header lines might indicate the old and new mode bits if they've changed, old and new filenames if the files were being renamed, and so on.

The blob SHAs (also called "blob IDs") are helpful if this patch is later applied by Git to the same project and there are conflicts while applying it. You will better understand what this means when you learn about the merges in the next chapter.

After the blob IDs, we have two lines: one starting with - signs, and the other starting with + signs. This is the traditional "unified diff" header, again showing the files being compared and the direction of the changes: - signs show lines in the A version that are missing from the B version, and + signs show lines missing in the A version but present in B.

If the patch were of this file being added or deleted in its entirety, then one of these would be /dev/null to signal that.

signs show lines in the A version but missing from the B version, and signs, lines missing in A version but present in B — `-` signs show lines in the A version but missing from the B version, and `+` signs, lines missing in A version but present in B

Consider the case where you delete a file:

rm awesome.txt

And then use git diff:

The A version, representing the state of the index, is currently awesome.txt, compared to the working dir where this file does not exist, so it is /dev/null. All lines are preceded by - signs as they exist only in the A version.

For now, undo the deleting (more on undoing changes in Part 3):

git restore awesome.txt

Going back to the diff we started with:

After this unified diff header, we get to the main part of the diff, consisting of "difference sections", also called "hunks" or "chunks" in Git. Note that these terms are used interchangeably, and you may stumble upon either of them in Git's documentation and tutorials, as well as Git's source code.

Every hunk begins with a single line, starting with two @ signs. These signs are followed by at most four numbers, and then a header for the chunk - which is an educated guess by Git. Usually, it will include the beginning of a function or a class, when possible.

In this example it doesn't include anything as this is a text file, so consider another example for a moment:

git diff --no-index example.py example_changed.py

When possible, Git includes a header for each hunk, for example a function or class definition

In the image above, the hunk's header includes the beginning of the function that includes the changed lines - def example_function(x).

Back to our previous example then:

After the two @ signs, you'll find four numbers:

The first numbers are preceded by a - sign as they refer to file A. The first number represents the line number corresponding to the first line in file A that this hunk refers to. In the example above, it is 1, meaning that the line This is a file corresponds to line number 1 in version file A.

This number is followed by a comma (,), and then the number of lines this chunk consists of in file A. This number includes all context lines (the lines preceded with a space in the diff), or lines marked with a - sign, as they are part of file A, but not lines marked with a + sign, as they do not exist in file A.

In our example, this number is 6, counting the context line This is a file, the - line It has a nice poem:, then the three context lines, and lastly Are belong to you.

As you can see, the lines beginning with a space character are context lines, which means they appear as shown in both file A and file B.

Then, we have a + sign to mark the two numbers that refer to file B. First, there's the line number corresponding to the first line in file B, followed by the number of lines this chunk consists of in file B.

This number includes all context lines, as well as lines marked with the + sign, as they are part of file B, but not lines marked with a - sign.

These four numbers are followed by two additional @ signs.

After the header of the chunk, we get the actual lines - either context, -, or + lines.

Typically and by default, a hunk starts and ends with three context lines. For example, if you modify lines 4-5 in a file with ten lines:

Line 1 - context line (before the changed lines)
Line 2 - context line (before the changed lines)
Line 3 - context line (before the changed lines)
Line 4 - changed line
Line 5 - another changed line
Line 6 - context line (after the changed lines)
Line 7 - context line (after the changed lines)
Line 8 - context line (after the changed lines)
Line 9 - this line will not be part of the hunk

So by default, changing lines 4-5 results in a hunk consisting of lines 1-8, that is, three lines before and three lines after the modified lines.

If that file doesn't have nine lines, but rather six lines - then the hunk will contain only one context line after the changed lines, and not three. Similarly, if you change the second line of a file, then there would be only one line of context before the changed lines.

How to Produce Diffs

The last example we considered shows a diff between two files. A single patch file can contain the differences for any number of files, and git diff produces diffs for all altered files in the repository in a single patch.

Often, you will see the output of git diff showing two versions of the same file and the difference between them.

To demonstrate, consider the state in another branch called diffs:

git checkout diffs

Again, I encourage you to run the commands with me - make sure you clone the repository from:

Omerr/gitting_things_repo

Contribute to Omerr/gitting_things_repo development by creating an account on GitHub.

At the current state, the active directory is a Git repository, with a clean status:

Take an existing file, my_file.py:

An example file - <FontIcon icon="fa-brands fa-python"/> — An example file - `my_file.py`

And change the second line from print('An example function!') to print('An example function! And it has been changed!'):

The contents of <FontIcon icon="fa-brands fa-python"/> after modifying the second line — The contents of `my_file.py` after modifying the second line

Save your changes, but don't stage or commit them. Next, run git diff:

The output of for <FontIcon icon="fa-brands fa-python"/> after changing it — The output of `git diff` for `my_file.py` after changing it

The output of git diff shows the difference between my_file.py's version in the staging area, which in this case is the same as the last commit (HEAD), and the version in the working directory.

I covered the terms "working directory", "staging area", and "commit" in the Git objects chapter, so check it out in ccase you would like to refresh your memory. As a reminder, the terms "staging area" and "index" are interchangeable, and both are widely used.

At this state, the status of the working dir is different from the status of the index. The status of the index is the same as that of `HEAD`

To see the difference between the working dir and the staging area, use git diff, without any additional flags.

Without switches, shows the difference between the staging area and the working directory — Without switches, `git diff` shows the difference between the staging area and the working directory

As you can see, git diff lists here both file A and file B pointing to my_file.py. file A here refers to the version of my_file.py in the staging area, whereas file B refers to its version in the working dir.

Note that if you modify my_file.py in a text editor, and don't save the file, then git diff will not be aware of the changes you've made. This is because they haven't been saved to the working dir.

We can provide a few switches to git diff to get the diff between the working dir and a specific commit, or between the staging area and the latest commit, or between two commits, and so on.

First create a new file, new_file.txt, and save it:

A simple new file saved as <FontIcon icon="fas fa-file-lines"/> — A simple new file saved as `new_file.txt`

Currently the file is in the working dir, and it is actually untracked in Git.

Now stage and commit this file:

git add new_file.txt
git commit -m "Commit 3"

Now, the state of HEAD is the same as the state of the staging area, as well as the working tree:

The state of is the same as the index and the working dir — The state of `HEAD` is the same as the index and the working dir

Next, edit new_file.txt by adding a new line at the beginning and another new line at the end:

Modifying <FontIcon icon="fas fa-file-lines"/> by adding a line in the beginning and another in the end — Modifying `new_file.txt` by adding a line in the beginning and another in the end

As a result, the state is as follows:

After saving, the state in the working dir is different than that of the index or `HEAD`

A nice trick would be to use git add -p, which allows you to split the changes even within a file, and consider which ones you'd like to stage.

In this case, add the first line to the index, but not the last line. To do that, you can split the hunk using s, then accept to stage the first hunk (using y), and not the second part (using n).

If you are not sure what each letter stands for, you can always use a ? and Git will tell you.

Using , you can stage only the first change — Using `git add -p`, you can stage only the first change

So now the state in HEAD is without either of those new lines. In the staging area you have the first line but not the last line, and in the working dir you have both new lines.

The state after staging only the first line

If you use git diff, what will happen?

`git diff` shows the difference between the index and the working dir

Well, as stated before, you get the diff between the staging area and the working tree.

What happens if you want to get the diff between HEAD and the staging area? For that, you can use git diff --cached:

shows the difference between and the index — `git diff --cached` shows the difference between `HEAD` and the index

And what if you want the difference between HEAD and the working tree? For that you can run git diff HEAD:

shows the difference between and the working dir — `git diff HEAD` shows the difference between `HEAD` and the working dir

To summarize the different switches for git diff we have seen so far, here's a diagram:

As a reminder, at the beginning of this chapter you used git diff --no-index. With the --no-index switch, you can compare two files that are not part of the repository - or of any staging area.

Now, commit the changes you have in the staging area:

git commit -m "Commit 4"

To observe the diff between this commit and its parent commit, you can run the following command:

git diff HEAD~1 HEAD

By the way, you can omit the 1 above and write HEAD~, and get the same result. Using 1 is the explicit way to state you are referring to the first parent of the commit.

Note that writing the parent commit here, HEAD~1, first results in a diff showing how to get from the parent commit to the current commit. Of course, I could also generate the reverse diff by writing:

git diff HEAD HEAD~1

The output of generates the reverse patch — The output of `git diff HEAD HEAD~1` generates the reverse patch

To summarize all the different switches for git diff we covered in this section, see this diagram:

A short way to view the diff between a commit and its parent is by using git show, for example:

git show HEAD

This is the same as writing:

git diff HEAD~ HEAD

We can now update our diagram:

`git diff HEAD~ HEAD` is used to show the difference between commits

You can go back to this diagram as a reference when needed.

As a reminder, Git commits are snapshots - of the entire working directory of the repository, at a certain point in time. Yet, it's sometimes not useful to regard a commit as a whole snapshot, but rather by the changes this specific commit introduced. In other words, by the diff between a parent commit to the next commit.

As you learned in the Git Objects chapter, Git stores the entire snapshots. The diff is dynamically generated from the snapshot data - by comparing the root trees of the commit and its parent.

Of course, Git can compare any two snapshots in time, not just adjacent commits, and also generate a diff of files not included in a repository.

How to Apply Patches

By using git diff you can see a patch Git generates, and you can then apply this patch using git apply.

Historical Note

Actually, sharing patches used to be the main way to share code in the early days of open source. But now - virtually all projects have moved to sharing Git commits directly through pull requests (called "merge requests" on some platforms).

The biggest problem with using patches is that it is hard to apply a patch when your working directory does not match the sender's previous commit. Losing the commit history makes it difficult to resolve conflicts. You will better understand this as you dive deeper into the process of git apply, especially in the next chapter where we cover merges.

A Simple Patch

What does it mean to apply a patch? It's time to try it out!

Take the output of git diff:

git diff HEAD~1 HEAD

And store it in a file:

git diff HEAD~1 HEAD > my_patch.patch

Use reset to undo the last commit:

git reset --hard HEAD~1

Don't worry about the last command - I'll explain it in detail in Part 3, where we discuss undoing changes. In short, it allows us to "reset" the state of where HEAD is pointing to, as well as the state of the index and of the working dir. In the example above, they are all set to the state of HEAD~1, or "Commit 3" in the diagram.

So after running the reset command, the contents of the file are as follows (the state from "Commit 3"):

nano new_file.txt

![new_file.txt]https://freecodecamp.org/news/content/images/2023/12/nano_new_file-1.png)

And you will apply this patch that you've just saved:

nano my_patch.patch

The patch you are about to apply, as generated by git diff

This patch tells Git to find the lines:

This is a new file
With new content!

Those lines used to be line number 1 and line number 2 in new_file.txt, and add a line with the content START! right above them.

Run this command to apply the patch:

git apply my_patch.patch

And as a result, you get this version of your file, just like the commit you have created before:

nano new_file.txt

The contents of after applying the patch — The contents of `new_file.txt` after applying the patch

Understanding the Context Lines

To understand the importance of context lines, consider a more advanced scenario. What happens if line numbers have changed since you created the patch file?

To test, start by creating another file:

nano test.text

Creating another file - <FontIcon icon="fas fa-file-lines"/> — Creating another file - `test.txt`

Stage and commit this file:

git add test.txt

git commit -m "Test file"

Now, change this file by adding a new line, and also erasing the line before the last one:

Changes to <FontIcon icon="fas fa-file-lines"/> — Changes to `test.txt`

Observe the difference between the original version of the file and the version including your changes:

git diff -- test.txt

(Using -- test.txt tells Git to run the command diff, taking into consideration only test.txt, so you don't get the diff for other files.)

Store this diff into a patch file:

git diff -- test.txt > new_patch.patch

Now, reset your state to that before introducing the changes:

git reset --hard

If you were to apply new_patch.patch now, it would simply work.

Let's now consider a more interesting case. Modify test.txt again by adding a new line at the beginning:

Adding a new line at the beginning of <FontIcon icon="fas fa-file-lines"/> — Adding a new line at the beginning of `test.txt`

As a result, the line numbers are different from the original version where the patch has been created. Consider the patch you created before:

<FontIcon icon="fas fa-file-lines"/> — `new_patch.patch`

It assumes that the line With more text is the second line in test.txt, which is no longer the case. So...will git apply work?

git apply new_patch.patch

It worked!

By default, Git looks for 3 lines of context before and after each change introduced in the patch - as you can see, they are included in the patch file. If you take three lines before and after the added line, and three lines before and after the deleted line (actually only one line after, as no other lines exist) - you get to the patch file. If these lines all exist - then applying the patch works, even if the line numbers changed.

Reset the state again:

git reset --hard

What happens if you change one of the context lines? Try it out by changing the line With more text to With more text!:

Changing the line to — Changing the line `With more text` to `With more text!`

And now:

git apply new_patch.patch

Well, no. The patch does not apply. If you are not sure why, or just want to better understand the process Git is performing, you can add the --verbose flag to git apply, like so:

git apply --verbose new_patch.patch

`git apply --verbose` shows the process Git is taking to apply the patch

It seems that Git searched lines from the file, including the line "With more text", right before the line "It has some really nice lines". This sequence of lines no longer exists in the file. As Git cannot find this sequence, it cannot apply the patch.

As mentioned earlier, by default, Git looks for 3 lines of context before and after each change introduced in the patch. If the surrounding three lines do not exist, Git cannot apply the patch.

You can ask Git to rely on fewer lines of context, using the -C argument. For example, to ask Git to look for 1 line of the surrounding context, run the following command:

git apply -C1 new_patch.patch

The patch applies!

Why is that? Consider the patch again:

When applying the patch with the -C1 option, Git is looking for the lines:

Like this one
And that one

in order to add the line !!!This is the new line!!! between these two lines. These lines exist (and, importantly, they appear one right after the other). As a result, Git can successfully add the line between them, even though the line numbers changed.

Similarly, Git would look for the lines:

How wonderful
So we are writing an example
Git is awesoome!

As Git can find these lines, Git can erase the middle one.

If we changed one of these lines, say, changed "How wonderful" to "How very wondeful", then Git would not be able to find the string above, and thus the patch would not apply.

Recap - Git Diff and Patch

In this chapter, you learned what a diff is, and the difference between a diff and a patch. You learned how to generate various patches using different switches for git diff. You also learned what the output of git diff looks like, and how it is constructed. Ultimately, you learned how patches are applied, and specifically the importance of context.

Understanding diffs is a major milestone for understanding many other processes within Git - for example, merging or rebasing, that we will explore in the next chapters.

Chapter 7 - Understanding Git Merge

By reading this chapter, you are going to really understand git merge, one of the most common operations you'll perform in your Git repositories.

What is a Merge in Git?

Merging is the process of combining the recent changes from several branches into a single new commit. This commit points back to these branches.

In a way, merging is the complement of branching in version control: a branch allows you to work simultaneously with others on a particular set of files, whereas a merge allows you to later combine separate work on branches that diverged from a common ancestor commit.

OK, let's take this bit by bit.

Remember that in Git, a branch is just a name pointing to a single commit. When we think about commits as being "on" a specific branch, they are actually reachable through the parent chain from the commit that the branch is pointing to.

That is, if you consider this commit graph:

You see the branch feature_1, which points to a commit with the SHA-1 value of ba0d2. As in previous chapters, I only write the first 5 digits of the SHA-1 value for brevity.

Notice that commit 54a9d is also "on" this branch, as it is the parent commit of ba0d2. So if you start from the pointer of feature_1, you get to ba0d2, which then points to 54a9d. You can go on the chain of parents, and all these reachable commits are considered to be "on" feature_1.

When you merge with Git, you merge commits. Almost always, we merge two commits by referring to them with the branch names that point to them. Thus we say we "merge branches" - though under the hood, we actually merge commits.

Time to Get Hands-on

For this chapter, I will use the following repository:

Omerr/gitting_things_merge

A practice repo accompanying Chapter 7 of Gitting Things Done book

As in previous chapters, I encourage you to clone it locally and have the same starting point I am using for this chapter.

OK, so let's say I have this simple repository here, with a branch called main, and a few commits with the commit messages of "Commit 1", "Commit 2", and "Commit 3":

Next, create a feature branch by typing git branch new_feature:

And switch HEAD to point to this new branch, by using git checkout new_feature (or git switch new_feature). You can look at the outcome by using git log:

The output of after using — The output of `git log` after using `git checkout new_feature`

As a reminder, you could also write git checkout -b new_feature, which would both create a new branch and change HEAD to point to this new branch.

If you need a reminder about branches and how they're implemented under the hood, please check out chapter 2. Yes, check out. Pun intended 😇

Now, on the new_feature branch, implement a new feature. In this example, I will edit an existing file that looks like this before the edit:

And I will now edit it to include a new function:

Implementing <FontIcon icon="fas fa-code-branch"/> — Implementing `new_feature`

And luckily, this is not a programming book, so this function is legit 😇

Next, stage and commit this change:

git add code.py

git commit -m "Commit 4"

Looking at the history, you have the branch new_feature, now pointing to "Commit 4", which points to its parent, "Commit 3". The branch main is also pointing to "Commit 3".

Time to merge the new feature! That is, merge these two branches, main and new_feature. Or, in Git's lingo, merge new_feature into main. This means merging "Commit 4" and "Commit 3". This is pretty trivial, as after all, "Commit 3" is an ancestor of "Commit 4".

Check out the main branch (with git checkout main), and perform the merge by using git merge new_feature:

Merging <FontIcon icon="fas fa-code-branch"/> into <FontIcon icon="fas fa-code-branch"/> — Merging `new_feature` into `main`

Since new_feature never really diverged from main, Git could just perform a fast-forward merge. So what happened here? Consider the history:

Even though you used git merge, there was no actual merging here. Actually, Git did something very simple - it reset the main branch to point to the same commit as the branch new_feature.

In case you don't want that to happen, but rather you want Git to really perform a merge, you could either change Git's configuration, or run the merge command with the --no-ff flag.

First, undo the last commit:

git reset --hard HEAD~1

Reminder: if this way of using reset is not clear to you, don't worry - we will cover it in detail in Part 3. It is not crucial for this introduction of merge, though. For now, it's important to understand that it basically undoes the merge operation.

Just to clarify, now if you checked out new_feature again:

git checkout new_feature

The history would look just like before the merge:

The history after using `git reset --hard HEAD~1`

Next, perform the merge with the --no-fast-forward flag (--no-ff for short):

git checkout main
git merge new_feature --no-ff

Now, if we look at the history using git lol:

History after merging with the flag — History after merging with the `--no-ff` flag

(Reminder: git lol is an alias I added to Git to visibly see the history in a graphical manner. You can find it, along with the other components of my setup, at the My Setup part of the Introduction chapter.)

Considering this history, you can see Git created a new commit, a merge commit.

If you consider this commit a bit closer:

git log -n1

You will see that this commit actually has two parents - "Commit 4", which was the commit that new_feature pointed to when you ran git merge, and "Commit 3", which was the commit that main pointed to.

A merge commit has two parents: the two commits it merged.

The merge commit shows us the concept of merge quite well. Git takes two commits, usually referenced by two different branches, and merges them together.

After the merge, as you started the process from main, you are still on main, and the history from new_feature has been merged into this branch. Since you started with main, then "Commit 3", which main pointed to, is the first parent of the merge commit, whereas "Commit 4", which you merged into main, is the second parent of the merge commit.

Notice that you started on main when it pointed to "Commit 3", and Git went quite a long way for you. It changed the working tree, the index, and also HEAD and created a new commit object. At least when you use git merge without the --no-commit flag and when it's not a fast-forward merge, Git does all of that.

This was a super simple case, where the branches you merged didn't diverge at all. We will soon consider more interesting cases.

By the way, you can use git merge to merge more than two commits - actually, any number of commits. This is rarely done, and to adhere to the practicality principle of this book, I won't delve into it.

Another way to think of git merge is by joining two or more development histories together. That is, when you merge, you incorporate changes from the named commits, since the time their histories diverged from the current branch, into the current branch. I used the term "branch" here, but I am stressing this again - we are actually merging commits.

Time For a More Advanced Case

Time to consider a more advanced case, which is probably the most common case where we use git merge explicitly - where you need to merge branches that did diverge from one another.

Assume we have two people working on this repo now, John and Paul.

John created a branch:

git checkout -b john_branch

And John has written a new song in a new file, lucy_in_the_sky_with_diamonds.md. Well, I believe John Lennon didn't really write in Markdown format, or use Git for that matter, but let's pretend he did for this explanation.

git add lucy_in_the_sky_with_diamonds.md
git commit -m "Commit 5"

While John was working on this song, Paul was also writing, on another branch. Paul had started from main:

git checkout main

And created his own branch:

git checkout -b paul_branch

And Paul wrote his song into a file called penny_lane.md. Paul staged and committed this file:

git add penny_lane.md
git commit -m "Commit 6"

So now our history looks like this - where we have two different branches, branching out from main, with different histories:

The history after John and Paul committed

John is happy with his branch (that is, his song), so he decides to merge it into the main branch:

git checkout main
git merge john_branch

Actually, this is a fast-forward merge, as we have learned before. You can validate that by looking at the history (using git lol, for example):

Merging <FontIcon icon="fas fa-code-branch"/> into <FontIcon icon="fas fa-code-branch"/> results in a fast-forward merge — Merging `john_branch` into `main` results in a fast-forward merge

At this point, Paul also wants to merge his branch into main, but now a fast-forward merge is no longer relevant - there are two different histories here: the history of main's and that of paul_branch's. It's not that paul_branch only adds commits on top of main branch or vice versa.

Now things get interesting. 😎😎

First, let Git do the hard work for you. After that, we will understand what's actually happening under the hood.

git merge paul_branch

Consider the history now:

When you merge <FontIcon icon="fas fa-code-branch"/>, you get a new merge commit — When you merge `paul_branch`, you get a new merge commit

What you have is a new commit, with two parents - "Commit 5" and "Commit 6".

In the working dir, you can see that both John's song as well as Paul's song are there (if you use ls, you will see both files in the working dir).

Nice, Git really did merge the changes for you. But how does that happen?

Undo this last commit:

git reset --hard HEAD~

How to Perform a Three-way Merge in Git

It's time to understand what's really happening under the hood. 😎

What Git has done here is it called a 3-way merge. In outlining the process of a 3-way merge, I will use the term "branch" for simplicity, but you should remember you could also merge two (or more) commits that are not referenced by a branch.

The 3-way merge process includes these stages:

First, Git locates the common ancestor of the two branches. That is, the common commit from which the merging branches most recently diverged. Technically, this is actually the first commit that is reachable from both branches. This commit is then called the merge base.

Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch. Git generates patches based on those diffs.

Third, Git applies both patches to the merge base using a 3-way merge algorithm. The result is the state of the new merge commit.

![](https://freecodecamp.org/news/content/images/2023/12/3_way_merge.png" alt="The three steps of the 3-way merge algorithm: (1) locate the common ancestor; (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch; (3) apply both patches together" width="828" height="522" loading="lazy"> The three steps of the 3-way merge algorithm: (1) locate the common ancestor (2) calculate diffs from the merge base to the first branch, and from the merge base to the second branch (3) apply both patches together

So, back to our example.

In the first step, Git looks from both branches - main and paul_branch - and traverses the history to find the first commit that is reachable from both. In this case, this would be… which commit?

Correct, the merge commit (the one with "Commit 3" and "Commit 4" as its parents).

If you are not sure, you can always ask Git directly:

git merge-base main paul_branch

![](https://freecodecamp.org/news/content/images/2023/12/3_way_merge_base.png" alt="The merge base is the merge commit with 'Commit 3' and 'Commit 4' as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the command" width="1424" height="515" loading="lazy"> The merge base is the merge commit with "Commit 3" and "Commit 4" as its parents. Note: the previous commit merge is blurred as it is not reachable via the current history following the reset command

By the way, this is the most common and simple case, where we have a single obvious choice for the merge base. In more complicated cases, there may be multiple possibilities for a merge base, but this is not within our focus.

In the second step, Git calculates the diffs. So it first calculates the diff between the merge commit and "Commit 5":

git diff 4f90a62 4683aef

(The SHA-1 values will be different on your machine.)

![](https://freecodecamp.org/news/content/images/2023/12/diff_4_5.png" alt="The diff between the merge commit and 'Commit 5'\label{fig-john-patch}" width="707" height="331" loading="lazy"> The diff between the merge commit and "Commit 5"

If you don't feel comfortable with the output of git diff, you can read the previous chapter where I described it in detail.

You can store that diff to a file:

git diff 4f90a62 4683aef > john_branch_diff.patch

Next, Git calculates the diff between the merge commit and "Commit 6":

git diff 4f90a62 c5e4951

![](https://freecodecamp.org/news/content/images/2023/12/diff_4_6.png" alt="The diff between the merge commit and 'Commit 6'" width="516" height="307" loading="lazy"> The diff between the merge commit and "Commit 6"

Write this one to a file as well:

git diff 4f90a62 c5e4951 > paul_branch_diff.patch

Now Git applies those patches on the merge base.

First, try that out directly - just apply the patches (I will walk you through it in a moment). This is not what Git really does under the hood, but it will help you gain a better understanding of why Git needs to do something different.

Checkout the merge base first, that is, the merge commit:

git checkout 4f90a62

And apply John's patch first (as a reminder, this is the patch shown in the image with the caption "The diff between the merge commit and "Commit 5""):

git apply --index john_branch_diff.patch

Notice that for now there is no merge commit. git apply updates the working dir as well as the index, as we used the --index switch.

You can observe the status using git status:

Applying John's patch on the merge commit

So now John's new song is incorporated into the index. Apply the other patch:

git apply --index paul_branch_diff.patch

As a result, the index contains changes from both branches.

Now it's time to commit your merge. Since the porcelain command git commit always generates a commit with a single parent, you would need the underlying plumbing command - git commit-tree.

If you need a reminder about porcelain vs plumbing commands, check out chapter 4 where I explained these terms, and created an entire repo from scratch.

Remember that every Git commit object points to a single tree. So you need to record the contents of the index in a tree:

git write-tree

Now you get the SHA-1 value of the created tree, and you can create a commit object using git commit-tree:

git commit-tree <TREE_SHA> -p <COMMIT_<span class="token file-descriptor important">5> -p <COMMIT_<span class="token file-descriptor important">6> -m "Merge commit!"

Great, so you have created a commit object!

Recall that git merge also changes HEAD to point to the new merge commit object. So you can simply do the same:

git reset --hard db315a

If you look at the history now:

The history after creating a merge commit and resetting `HEAD`

Note

in this state, HEAD is "detached" - that is, it directly points to a commit object rather than a named reference. gg does not show HEAD when it is "detached", so don't be confused if you can't see HEAD in the output of gg.

This is almost what we wanted. Remember that when you ran git merge, the result was HEAD pointing to main which pointed to the newly created commit (as shown in the image with the caption "When you merge paul_branch, you get a new merge commit". What should you do then?

Well, what you want is to modify main, so you can just point it to the new commit:

git checkout main
git reset --hard db315a

And now you have the same result as when you ran git merge: main points to the new commit, which has "Commit 5" and "Commit 6" as its parents. You can use git lol to verify that.

So this is exactly the same result as the merge done by Git, with the exception of the timestamp and thus the SHA-1 value, of course.

Overall, you got to merge both the contents of the two commits - that is, the state of the files, and also the history of those commits - by creating a merge commit that points to both histories.

In this simple case, you could actually just apply the patches using git apply, and everything works quite well.

Quick Recap of a Three-way Merge

So to quickly recap, on a three-way merge, Git:

First, locates the merge base - the common ancestor of the two branches. That is, the first commit that is reachable from both branches.
Second, Git calculates two diffs - one diff from the merge base to the first branch, and another diff from the merge base to the second branch.
Third, Git applies both patches to the merge base, using a 3-way merge algorithm. I haven't explained the 3-way merge yet, but I will elaborate on that later. The result is the state of the new merge commit.

You can also understand why it's called a "3-way merge": Git merges three different states - that of the first branch, that of the second branch, and their common ancestor. In our previous example, main, paul_branch, and the merge commit (with "Commit 3" and "Commit 4" as parents), respectively.

This is unlike, say, the fast-forward examples we saw before. The fast-forward examples are actually a case of a two-way merge, as Git only compares two states - for example, where main pointed to, and where john_branch pointed to.

Moving on

Still, this was a simple case of a 3-way merge. John and Paul created different songs, so each of them touched a different file. It was pretty straightforward to execute the merge.

What about more interesting cases?

Let's assume that now John and Paul are co-authoring a new song.

So, John checked out main branch and started writing the song:

git checkout main