Git Objects — blob, tree and commit
Git Objects — blob, tree and commit 관련
It is very useful to think about git
as maintaining a file system, and specifically — snapshots of that system in time.
A file system begins with a root directory (in UNIX-based systems, /
), which usually contains other directories (for example, /usr
or /bin
). These directories contain other directories, and/or files (for example, /usr/1.txt
).
In git
, the contents of files are stored in objects called blobs, binary large objects.
The difference between blobs and files is that files also contain meta-data. For example, a file “remembers” when it was created, so if you move that file into another directory, its creation time remains the same.
Blobs, on the other hand, are just contents — binary streams of data. A blob doesn’t register its creation date, its name, or anything but its contents.
Every blob in git
is identified by its SHA-1 hash. SHA-1 hashes consist of 20 bytes, usually represented by 40 characters in hexadecimal form. Throughout this post we will sometimes show just the first characters of that hash.
In git
, the equivalent of a directory is a tree. A tree is basically a directory listing, referring to blobs as well as other trees.
Trees are identified by their SHA-1 hashes as well. Referring to these objects, either blobs or other trees, happens via the SHA-1 hash of the objects.
Note that the tree CAFE7 refers to the blob F92A0 as pic.png
. In another tree, that same blob may have another name.
The diagram above is equivalent to a file system with a root directory that has one file at /test.js
, and a directory named /docs
with two files: /docs/
pic.png
and /docs/
1.txt
.
Now it’s time to take a snapshot of that file system — and store all the files that existed at that time, along with their contents.
In git
, a snapshot is a commit. A commit object includes a pointer to the main tree (the root directory), as well as other meta-data such as the committer, a commit message and the commit time.
In most cases, a commit also has one or more parent commits — the previous snapshot(s). Of course, commit objects are also identified by their SHA-1 hashes. These are the hashes we are used to seeing when we use git log
.
Every commit holds the entire snapshot, not just diffs from the previous commit(s).
How can that work? Doesn’t that mean that we have to store a lot of data every commit?
Let’s examine what happens if we change the contents of a file. Say that we edit 1.txt
, and add an exclamation mark — that is, we changed the content from HELLO WORLD
, to HELLO WORLD!
.
Well, this change would mean that we have a new blob, with a new SHA-1 hash. This makes sense, as sha1("HELLO WORLD")
is different from sha1("HELLO WORLD!")
.
Since we have a new hash, then the tree’s listing should also change. After all, our tree no longer points to blob 73D8A, but rather blob 62E7A instead. As we change the tree’s contents, we also change its hash.
And now, since the hash of that tree is different, we also need to change the parent tree — as the latter no longer points to tree CAFE7, but rather tree 24601. Consequently, the parent tree will also have a new hash.
Almost ready to create a new commit object, and it seems like we are going to store a lot of data — the entire file system, once more! But is that really necessary?
Actually, some objects, specifically blob objects, haven’t changed since the previous commit — blob F92A0 remained intact, and so did blob F00D1.
So this is the trick — as long as an object doesn’t change, we don’t store it again. In this case, we don’t need to store blob F92A0 and blob F00D1 once more. We only refer to them by their hash values. We can then create our commit object.
Since this commit is not the first commit, it has a parent — commit A1337.
So to recap, we introduced three git objects:
- blob — contents of a file.
- tree — a directory listing (of blobs and trees).
- commit — a snapshot of the working tree.
Let us consider the hashes of these objects for a bit. Let’s say I wrote the string git is awesome!
and created a blob from it. You did the same on your system. Would we have the same hash?
The answer is — Yes. Since the blobs consist of the same data, they’ll have the same SHA-1 values.
What if I made a tree that references the blob of git is awesome!
, and gave it a specific name and metadata, and you did exactly the same on your system. Would we have the same hash?
Again, yes. Since the trees objects are the same, they would have the same hash.
What if I created a commit of that tree with the commit message Hello
, and you did the same on your system. Would we have the same hash?
In this case, the answer is — No. Even though our commit objects refer to the same tree, they have different commit details — time, committer etc.