Git Internals - in a nutshell



Git Internals - in a nutshell


People use a tool or technology on a daily basis, sometimes for years, without truly understanding it in-depth or knowing how it really works. In this blog post, we will take a closer look at the wonderful world of Git.

Let's start by establishing what git is
Well, if we ask Dr. Google we will probably get the following answer: "Git is a distributed version control and version management system". Now, let's break down what we just read and see if we can find a more reader-friendly explanation.

So, here is the deal. Git is the most widely used version control system in the world and quite surprisingly it is actually a well maintained and super popular open source.

Why do we need git?
Well, first of all, Git is able to track changes in files by using snapshots of the filesystem. Try to imagine that instead of source code you and your team are working on the same Google doc. What would happen? you would basically override each other's changes. Think about it, what is a source code really? it is just a bunch of files and folders that different people need to be able to work with at the same time.

How does it work?
Basically, a standard git workflow would be defining and initiating a local working directory as a repository. In that repository, you can keep track of files within your working directory. You are then able to stage your changes before making a final snapshot (Commit), and then commit these changes. Eventually, the files with the updated changes should be pushed into a remote repository which will allow easy collaboration.

Git internals
When we first initialize a local Git repository (To convert an existing project in your file system or to initiate an empty repository), you actually create a new .git subdirectory. This new folder will contain everything you need to be able to apply git workflows.

Let's dive in...
So what does this mysterious folder contain?
For demonstration purposes, I have created an empty repo on my machine and named it "mySuperCoolRepo".
Executing this command in Linux based OS will present all of the subdirectories in our empty repo:

ls -ld .?* 


inside our .git folder we find more surprises:



Once we step inside this folder we see that it contains the necessary metadata, Objects, Refs, templates, and a HEAD file which point to the currently checked-out commit.



Next up...

  • I'm going to remove the template sample files just to make the demonstration more clear. 
  • I'm also going to watch the .git subdirectory so we can see what actually happens in git internals "live". 
  • I'm creating a file called "mytextfile.txt" and adding some text into it.




  • Now, let's stage our file.



Once I staged my file we can see in the above screenshot that an object was created.
Let's take a look at this object.
When I present it in my terminal you can notice that it is a compressed binary file, I will use a shell command to open the content of this file.
As you can see in the below screenshot, this is, in fact, my "mytextfile.txt" I have staged.



So what actually happened is that my file has been compressed and stored on disk using something called SHA1 (Secure hashing algorithm). It contains all the data the original file contained without even mentioning the original file name (Which is stored in the index).


The index is a file Git uses to keep track of it's three main areas mentioned. The working directory, the staging area, and the committed repository. Once there are changes added to the staging area Git will update the index file and create the blob object (Like we saw happening in the above example).

So, the next step would be committing our changes right? Let's do that...


So, what just happened here?
We see that there are actually 3 objects created on disk. What are these objects?
Let's open the last object...





So, we learn from this that Git creates a tree object that maps a file on disk. (You might have seen Git being illustrated as a tree-like structure).
The type of this object is a "Commit".
We can actually access that commit object (Which has a tree reference) and view its details



In conclusion
This is an overview of Git internals in a nutshell and a quick introduction of how it works "under the hood". I always say that a lifetime isn't enough to learn everything I would like to and Git is not an exception.




Want to learn more cool stuff?

Click this link for more details about our Advanced Test Automation & DevOps course.





Comments

Popular posts from this blog

Sharing is caring - Intro to Jenkins shared libraries

Chromedriver - Under The Hood

Intro to Terraform and how it is related to test automation infrastructure