Scanning and removing secrets from git repositories

Recently I had a situation where I wanted to turn a private git repository, public. Immediately the problem of secrets in the repository’s history was raised. I know that secrets should not be committed but, due to practical reasons it often happens. This means that before publishing a repository its secrets need to be sanitized from all commits.

Finding secrets in a repository with thousands of commits is unpractical if done by hand. To my pleasant surprise, there is an automated tool called GitGuardian which offers the service of scanning repositories for secrets. I found it successfully discovered API keys and PEM keys in a variety of files. I was really amazed, especially as some of the keys were in obscure formats. Even inside Javascript it successfully found API keys.

An example of the dashboard for the scan results of a repository

There are some false positives but the interface is really friendly and one can easily mark some issues to be ignored.

The next step is to either revoke the keys or expunge all secrets from history. Expunging the keys requires the rewriting of history and is potentially destructive but that is the point. To achieve that, I found a StackOverflow post that suggested the use of a tool called git-filter-repo.

To install the tool just run:

# Make sure that user specific binaries are in your PATH
# In my case the following command put git-filter-repo in
# $HOME/.local/bin/
# This means you might need to expand your PATH like
# export PATH=$HOME/.local/bin:PATH
python3 -m pip install --user git-filter-repo

Due to the history modifying purpose of git-filter-repo I advise that the operation be done on a fresh clone set up specifically for the purpose.

git-filter-repo takes a very simple file format even with glob support. I leave an example for glob and non-glob:

glob:Key: "*"==>Key: "@@PLACEHOLDER@@"
9yli7hjr8y0swt1==>XXXX

The first line will replace any occurrence of the string Key: “*” with Key: “@@PLACEHOLDER@@” where the * can be any text.

The second line just turns any literal entry 9yli7hjr8y0swt1 into XXXX

As you can surmise the ==> is the delimiting string.

Finally, keep in mind the following aspects:

  • GitGuardian is not infallible. If letting a secret slip is something you cannot afford then:
    • Do not publish anything at all.
    • Squash all your history into one commit. This will help for manual inspection.
  • You are rewriting history. If you make a mistake in the replace you may permanently damage your repository for your purposes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s