How to Create Objects in Git

Omer RosenbaumDecember 15, 2020About 5 min

How to Create Objects in Git 관련

A Visual Guide to Git Internals — Objects, Branches, and How to Create a Repo From Scratch

Many of us use git on a daily basis. But how many of us know what goes on under the hood? For example, what happens when we use git commit? What is stored between commits? Is it just a diff between the current and previous commit? If so, how

A Visual Guide to Git Internals — Objects, Branches, and How to Create a Repo From Scratch

Let's start with creating an object and writing it into the objects’ database of git, residing within .git\objects. We'll find the SHA-1 hash value of a blob by using our first plumbing command, git hash-object, in the following way:

ECHO git is awesome | git hash-object --stdin

echo "git is awesome" | git hash-object --stdin

By using --stdin we are instructing git hash-object to take its input from the standard input. This will provide us with the relevant hash value.

In order to actually write that blob into git’s object database, we can simply add the -w switch for git hash-object. Then, we can check the contents of the .git folder, and see that they have changed.

We can now see that the hash of our blob is — 54f6...36. We can also see that a directory has been created under .git\objects, a directory named 54, and within it, a file by the name of f6...36.

So git actually takes the first two characters of the SHA-1 hash and uses them as the name of a directory. The remaining characters are used as the filename for the file that actually contains the blob.

Why is that so? Consider a fairly big repository, one that has 300,000 objects (blobs, trees, and commits) in its database. To look up a hash inside that list of 300,000 hashes can take a while. Thus, git simply divides that problem by 256. To look up the hash above, git would first look for the directory named 54 inside the directory .git\objects, which may have up to 256 directories (00 through FF). Then, it will search that directory, narrowing down the search as it goes.

Back to our process of generating a commit. We have now created an object. What is the type of that object? We can use another plumbing command, git cat-file -t (-t stands for “type”), to check that out:

Not surprisingly, this object is a blob.

We can also use ( stands for “pretty-print”) to see its contents: — We can also use `git cat-file -p` (`-p` stands for “pretty-print”) to see its contents:

This process of creating a blob usually happens when we add something to the staging area — that is, when we use git add.

Remember that git creates a blob of the entire file that is staged. Even if a single character is modified or added (as we added ! in our example before), the file has a new blob with a new hash.

Will there be any change to ? — Will there be any change to `git status`?

Apparently, no. Adding a blob object to git’s internal database doesn’t change the status, as git doesn’t know of any tracked or untracked files at this stage.

We need to track this file — add it to the staging area. To do that, we can use the plumbing command git update-index, like so: git update-index --add --cacheinfo 100644 <blob-hash> <filename>.

Note: (The cacheinfo is a 16-bit file mode as stored by git, following the layout of POSIX types and modes. This is not within the scope of this post).

Running the command above will result in a change to <FontIcon icon="fas fa-folder-open"/>'s contents — Running the command above will result in a change to `.git`'s contents

Can you spot the change? A new file by the name of index was created. This is it — the famous index (or staging area), is basically a file that resides within .git\index.

So now that our blob has been added to the index, we expect to look different, like this — So now that our **blob** has been added to the **index**, we expect `git status` to look different, like this

That’s interesting! Two things happened here.

First, we can see that new_file.txt appears in green, in the Changes to be committed area. That is so because the index now has new_file.txt, waiting to be committed.

Second, we can see that new_file.txt appears in red — because git believes the file my_file.txt has been deleted, and the fact that the file has been deleted is not staged.

This happens as we added the blob with the contents git is awesome to the objects’ database, and told the index that the file my_file.txt has the contents of that blob, but we never actually created that file.

We can easily solve this by taking the contents of the blob, and writing them to our file system, to a file called my_file.txt:

As a result, it will no longer appear in red by `git status`

So now it’s time to create a commit object from our staging area. As explained above, a commit object has a reference to a tree, so we need to create a tree.

We can do this with the command git write-tree, which records the contents of the index in a tree object. Of course, we can use git cat-file -t to see that it’s indeed a tree:

And we can use to see its contents: — And we can use `git cat-file -p` to see its contents:

Great, so we created a tree, and now we need to create a commit object that references this tree.

To do that, we can use `git commit-tree <tree-hash> -m <commit message>`

You should now feel comfortable with the commands used to check the created object’s type, and print its contents:

Note that this commit doesn’t have a parent, because it’s the first commit. When we add another commit we will have to declare its parent — we will do so later.

The last hash that we got — 80e...8f – is a commit’s hash. We are actually very used to using these hashes — we look at them all the time. Note that this commit owns a tree object, with its own hash, which we rarely specify explicitly.

Will something change in ? — Will something change in `git status`?

Nope 🤔.

Why is that? Well, to know that our file has been committed, git needs to know about the latest commit. How does git do that? It goes to the HEAD:

Looking at on Windows — Looking at `HEAD` on Windows

Looking at on UNIX — Looking at `HEAD` on UNIX

HEAD points to master, but what is master? We haven’t really created it yet.

As we explained earlier in this post, a branch is simply a named reference to a commit. And in this case, we would like master to refer to the commit with the hash 80e8ed4fb0bfc3e7ba88ec417ecf2f6e6324998f.

We can achieve this by simply creating a file at \refs\heads\master, with the contents of this hash, like so:

Note

In sum, a branch is just a file inside .git\refs\heads, containing a hash of the commit it refers to.

Now, finally, and seem to appreciate our efforts — Now, finally, `git status` and `git log` seem to appreciate our efforts

We have successfully created a commit without using porcelain commands! How cool is that? 🎉