Getting Files In a Project Based on Gitignore Exclusions

  • report
    Disclaimer
    Click for Disclaimer
    This Post is over a year old (first published about 4 years ago). As such, please keep in mind that some of the information may no longer be accurate, best practice, or a reflection of how I would approach the same thing today.
  • infoFull Post Details
    info_outlineClick for Full Post Details
    Date Posted:
    Sep. 02, 2020
    Last Updated:
    Sep. 02, 2020
  • classTags
    classClick for Tags

Given that so many projects use git and its .gitignore file for specifying which files should be tracked as part of the repo’s source code, it is very tempting to reuse the .gitignore file as input into various devops tools and automated scripts. How can we do this? 🤔

With Git Itself

If we have access to git and its command line interface, there are several ways that we can natively get a list of files based on the gitignore patterns.

  • One-by-one: check-ignore
  • Bulk: ls-files
    • You can use the git-ls-files command to query git for lots of info related to its file tracking
    • Example: git ls-files --full-name to get full paths of files that are not ignored
    • Limitation: Slightly complicated to capture everything you need:
      • Tracked files only: standard git ls-files works
      • Untracked only, but respect .gitignore: Need to use git ls-files --others --exclude-standard
      • BOTH tracked and untracked, and respect ignores: git ls-files --others --exclude-standard --cached
  • Bulk: ls-tree
    • You can use the git-ls-tree command for tree object based file ls operations
    • Example: git ls-tree --full-tree -r --name-only HEAD – get all file paths, from root dir (credit)
    • Example: cd subdir && git ls-tree -r --name-only --full-name HEAD to get full filepaths from a subdirectory
      • Note that we drop --full-tree and add --full-name to make sure we get files only for the subdirectory, but with absolute paths
    • Limitation: Requires a reference to a git object (e.g. HEAD, SHA, etc.). Therefore, fatally fails on a brand new repo (no commit history).

Best Git Command?

Based on the above research, the best command to get an exhaustive file list of included files, when you care about respecting .gitignore but don’t care about tracking status, is:

git ls-files --full-name --others --cached --exclude-standard
#                |--> Get full file name (abs path)
#                            |--> Include *untracked*
#                                     |--> Include *tracked*
#                                                   |---> Apply `.gitignore` rules

If you want the same as above, but want the filenames of those files specifically excluded (the inverse of above), add the --ignored flag.

With Libraries

Using the Git commands above have the benefit of being portable and requiring only access to git and a terminal. However, they also come with caveats, complexity, and the limitations that come with shell scripting (😬).

If you have access to NodeJS, there are a lot of libraries out there that will make retrieving file lists a lot less of a headache 😅

  • Modules / API only
  • CLI
    • NPM: globby-cli (this is a wrapper around globby, above)
      • Example: npx globby-cli "images/**" --gitignore
      • Make sure you include the --gitignore flag, since the default for respecting it is false.
    • NPM: globstar

Leave a Reply

Your email address will not be published.