Skip to content

Reimplement "git restore" in managed code

June requested to merge git-managed into main

The implementation of git restore in Git incorrectly obtains an index.lock file, even though it never modifies the Git repository. This prevents multiple build machines from running git restore from a Git cache on a network share at the same time, even though it's a valid operation to do so. There is no way in Git to bypass this lock, short of patching Git itself which is infeasible.

The alternative is to re-implement the behaviour of git restore in managed code, which involves parsing Git packfiles and packfile indexes, reading Git commits, trees and decompressing Git objects to extract them to disk. I think we can get a pretty performant implementation here, especially since we can leverage async/parallel operations to do all the filesystem operations in parallel.

  • Implement Git packfile index parsing
  • Implement binary search for Git packfile indexes
  • Implement Git packfile parsing, including functions to get an object by SHA
  • Implement parsing .gitignore to find out what files in the target filesystem are tracked according to Git (so we can delete files that are deleted when switching commits, while leaving ignored files alone)
  • Implement filesystem reconciliation moving from commit A to commit B (i.e. the equivalent of running git checkout B while on A).
  • Upgrading Redpoint.Reservation so it supports multi-reader, single-writer scenarios, instead of only single reader-writer. We'll want to use Redpoint.Reservation to obtain an exclusive writer when fetching new commits into the repository (in case the git fetch operation modifies packfiles), while allowing multiple readers to use the managed git restore in parallel.
    • I think we can get this behaviour by leveraging FileShare.Read vs FileShare.None, but I need to figure out how this interacts with FileMode.Create and FileOptions.DeleteOnClose.
Edited by June

Merge request reports