Git concept: Submodule creates a link; subtree makes a copy

Once in a while, one wants to put several github repositories together for various use cases:

  1. Organization: Put related repos into one master repository for easy access and reference
  2. Merge code from two repos: Merge changes from two repositories together, keeping the history of both
  3. Reuse code from another repo: Use code from another repo (for example, a libary) as part of your own project. Have the ability to synchronize the parent changes within your own repo.

One inevitably came upon two git tools when looking for solutions to the above: Subtree and Submodule. They both seem to be able to do the above tasks, but they are different. What do they do? How are they different? How do they work? How to use them?

Under the hood subtree and submodule

  • Both get the code into a master repository with the commit history.

Difference:

  1. “submodule is a link; subtree is a copy”

Submodule links to another repository. It can still pull and push updates from the original repository if you want.

Subtree does not link to a repository anymore, it is no longer an independent git repository. The original repository no longer has relationship to the copy made, and can be deleted when subtree copy is done.

2. What is copied over: submodule directory has .git directory, subtree directory does not have .git directory.

  • subtree makes a copy of the original repo and the commit history. It does not have a .git directory
  • submodule has a .git directory in its folder just like the original, it is still an independent git repo. In addition, when you add a submodule, two files are added: .gitmodules and repodirectory. These two files create links to the original repository, and keep track of the updates within the copied repo.

3. subtree behaves like a branch; submodule behaves like a fork

A master repository containing multiple subtrees, is only one git repository, the subtrees behave like a branch to the master repository.

A master repository containing multiple submodules, are multiple git repositories. Each submodule is a seperate git repositiory, submodule behaves likes a forked repository.

When to use subtree, when to use submodule

  • Archive: Organize related old repos into one master repository, then deleted the old repos.
  • Merge changes from two repositories together, keeping the history of both, then delete the old repos

If you want to use code from another repo to be part of your project, but want to keep it updated to the source repo, use submodule.

Submodules can be used for adding a subproject to a superproject, keeping both working trees and keeping the history of both separate. You have the choice of splitting the two projects in the future.

Also, since submodules are fixed to an arbitrary version, the other project can be independently developed without affecting the superproject, allowing the superproject project to fix itself to new versions only when desired.

How to add subrepo using subtree

# Add a subtree repository

git subtree add --prefix=<subdirectoryname> <url> <branch>

Example:

git subtree add --prefix=subrepo git@github.com:gingkolane/subrepo.git master# push to github
git push
# check github commit history in main, you can see all the commit history

More subtree usage can be found in git subtree’s documentations.

How to add subrepo using submodule

# add a submodule, path is the name of the subdirectory name

git submodule add <url> <path> 

# occasionally update the submodule to a new version:

git -C <path> checkout <new version> git add <path> git commit -m “update submodule to new version” 

# See the list of submodules in a superproject

git submodule status 

More submodule commands can be found in git submodule documentation.

Full-stack FHIR Developer