How I Use Git LFS to Manage Large Git Files?
Introduction
Git is a powerful version control system that allows us to track changes to any file and easily revert to any version. However, when it comes to storing large files, Git struggles because Git is not designed for storing large files (images, videos, music, etc.).
Imagine a 1MB
image that has been modified 10 times; Git would store 10 versions of the image, which amounts to 10 MB
in the repository! (Although this simplified explanation is not entirely accurate, as Git stores the differences between two versions rather than a complete copy of each version, the challenge of compressing differences between binary files remains.) This can lead to performance issues, wasted storage space, and excessive network resource consumption. Git Large File Storage is an extension of Git specifically designed to address these issues.
How Does Git LFS Solve the Problem?
Git LFS solves the problem by storing the contents of large files on an external server instead of directly in the Git repository. By replacing the actual large files with pointers to the large files, the size of the Git repository does not increase, regardless of how large the files are or how many times they are modified.
In addition to improved storage, syncing large files to the local machine becomes easier, and compared to downloading the entire history of a Git repository, using LFS allows us to request only the currently necessary large files to save resources.
How to Use Git LFS?
1. Download Git LFS
First, you need to install Git LFS (obviously 😐)
2. Configure Files to Use Git LFS
In the Git repository where you want to use Git LFS, choose the file types you want Git LFS to manage (or edit .gitattributes
directly). You can configure new file extensions that need LFS intervention at any time. The following setting informs LFS to handle all files with the .png
extension.
After the configuration, ensure that the changes to .gitattributes
are committed to the Git repository.
3. Commit Files
If you, like me, use GitHub to host your code, you can simply commit the files as usual, and GitHub will handle the rest for you 🤯. When previewing LFS files on GitHub, there will also be annotations, and the experience is completely consistent with using Git without LFS.
If you want to list the files currently managed by LFS, you can run the following command:
You will get a detailed list of LFS-managed files as follows:
Experience After Using Git LFS
Currently, all large files on the website are hosted through GitHub LFS, significantly simplifying my workflow and improving efficiency.
In the early days of building this site, it required setting up an additional CMS server or using third-party asset hosting services like Cloudinary to bring images into articles. But now, as long as the images are placed in the Git repository, they can be used directly. This is an elegant and streamlined process for a website like WebDong, which is solely managed by developers (GitHub acts like my Git-Based CMS).
Even if your website is not managed by developers, you can still use Git LFS to manage large files, such as some Open Graph images, font files… Having all resources managed in one place is also very convenient.
Things to Note When Using Git LFS
Git LFS Service Is Not Unlimited and Free
You can check bill for Git Large File Storage and Viewing your Git Large File Storage usage or the pricing details of LFS providers to learn more.
Taking the entire Web Dong blog as an example, there are a total of 251
LFS files, including various videos, images, and fonts, which have only used 43.2 MB
of space. This is, of course, because I have been mindful and compressed these files in advance, but I must say that the monthly 1 GB free quota is already sufficient for many scenarios.
CI/CD Process Considerations
Since Git LFS stores large files on additional servers, extra configuration is needed in the CI/CD process to download these LFS files. This point has been practiced in my blog, and there is also additional caching setup to avoid fetching the same LFS files repeatedly.
Conclusion
I wrote this article because the team previously did not have the habit of adopting Git LFS, and some large web assets were still stored directly in the Git repository. With architecture updates and the introduction of a Monorepo structure, it is expected that project size will increase sharply, prompting me to hope to introduce Git LFS early on to prevent the issue of the Git repository exploding.
I believe this technology is particularly suitable for use in certain fields, such as front-end or game development, where interaction with a large number of binary files is required.