Archive Unused Repositories
IMG_6696 by BenDibble is licensed under CC BY-ND
At the time of writing this article, theScore has approximately 250 repositories hosted on GitHub. The majority of these are private, although some public ones do exist as well. I suspect that software companies of sufficient size eventually accumulate a large number of repositories due to new projects, libraries, experiments, etc. An employee being faced with these numbers can waste a lot of time searching through these repositories and determining whether a repository is actively used.
Note: We’ll be focused on GitHub as the platform of choice, although the same principals can be applied to other code hosting platforms.
Over time, a repository can fade in activity and end up in either a finished and active or a unused state. In the event that someone stumbles upon an unused repository, they can waste a lot of time if it’s not clear that the repository is in fact unused. These repositories faded over time and so it will still have a detailed README and possibly open issues and pull requests. In the worst case scenario, an individual might put time and effort into addressing these unresolved issues to no benefits.
At theScore, we use GitHub Teams to help determine the ownership of repositories. As you can guess, when you have unused repositories, this leads to a cognitive burden on the team. Routine tasks might involve going over all the repositories that a team owns, which just eats up time if you are unsure a repository is used or not.
In addition, theScore also uses services that periodically check for security violations or outdated dependencies. In a similar vein, these services are triggering for unused repositories, thus leading to more noise. In some instances, the services are priced by the number of repositories or invocations, and so unused repositories can lead to additional costs as well.
So one option that might have crossed your mind is to just delete/remove the repository. While it does the job of reducing the burden of having unused repositories, I would argue that there is a wealth of knowledge in the history of the repository and the code hosting platform (i.e., GitHub via issues and pull requests). A perfect example of where archiving makes a lot of sense is in open source software (i.e., like in rails/actioncable) – you want to retain the open contributions and communications.
To prevent future confusion and wasted effort/time/cost, we’ve made a conscious decision to archive unused repositories. Fortunately, GitHub actually provides a mechanism for archiving repositories. I highly recommend giving GitHub’s blog post a read as it details how you use the feature and provides some recommendations for archiving.
The main takeaways of archiving a repository on GitHub are:
The first thing we do is create our final pull request, which adds the following notice to the top of the README:
# This repository is ⚰️ ARCHIVED ⚰️
FancyProject as a product has been sunsetted and decommissioned as the adoption was not as great as we were hoping for. Resources and efforts were moved to other projects. The last day of operation was on September 18, 2018.
This pull request has at least 4 major tasks in it:
We close the issues with the following message so that people know the reason why it was closed:
Closing open issues and pull requests as the project is now archived (<pr_of_final_update>).
A link to the pull request is also mentioned in the comment. We do this so the last pull request that GitHub includes all the issues and pull requests that were closed due to the archiving process. You can see an example of this in the image below:
When a repository is archived you cannot change the contributors (i.e., adding/removing teams). In the event that a repository was only visible to a subset of developers (it sometimes happens), we ensure our developers team is a contributor before archiving. This subtle change makes it so everyone in the organization can see the repository (in the event it was needed for something).
As part of the archiving process, we want to disable/remove any external services that might be monitoring the repository. For example, Snyk checks for security vulnerabilities periodically, and would continue to do so unless disabled. As previously mentioned, some of these services are priced by invocations and/or the number of repositories, so it is important that this step is completed.
In the event that the repository represents a deployable project, you want to make sure you’ve taken steps to decommission all aspects. For example, there might be servers, databases, cloud storage, etc., that are linked to this repository and need to be taken care of.
Finally, with everything resolved, we can archive the repository in GitHub. Enjoy the event, as it simplifies the surface area of repositories you have to maintain.