When you want to load data into Pachyderm without triggering a pipeline, you can upload it to a staging branch and then submit accumulated changes in one batch by re-pointing the HEAD of your master branch to a commit in the staging branch. Letβs see how this works.
How to Use a Staging Branch #
Create a repository. For example,
data.pachctl create repo dataCreate a
masterbranch.pachctl create branch data@masterView the created branch:
pachctl list commit dataREPO BRANCH COMMIT FINISHED SIZE ORIGIN DESCRIPTION data master 8090bfb4d4fe44158eac12199c37a591 About a minute ago 0B AUTOPachyderm automatically created an empty
HEADcommit on the new branch, as you can see from the0B(zero-byte) size andAUTOcommit origin.Commit a file to a staging branch:
pachctl put file data@staging -f <file>Pachyderm automatically creates the
stagingbranch. Your repo now has 2 branches,stagingandmaster. In this example, thestagingname is used, but you can name the branch as you want β and have as many staging branches as you need.Verify that the branches were created:
pachctl list branch dataBRANCH HEAD TRIGGER staging f3506f0fab6e483e8338754081109e69 - master 8090bfb4d4fe44158eac12199c37a591 -The
masterbranch still has the sameHEADcommit. No jobs have started to process the new file, because there are no pipelines that takestagingas inputs. You can continue to commit tostagingto add new data to the branch, and the pipeline will not process anything.When you are ready to process the data, update the
masterbranch to point it to the head of the staging branch:pachctl create branch data@master --head stagingList your branches to verify that the master branchβs
HEADcommit has changed:pachctl list branch datastaging f3506f0fab6e483e8338754081109e69 master f3506f0fab6e483e8338754081109e69The
masterandstagingbranches now have the sameHEADcommit. This means that your pipeline has data to process.Verify that the pipeline has new jobs:
pachctl list job data@f3506f0fab6e483e8338754081109e69 ID PIPELINE STARTED DURATION RESTART PROGRESS DL UL STATE f3506f0fab6e483e8338754081109e69 data 32 seconds ago Less than a second 0 6 + 0 / 6 108B 24B successYou should see one job that Pachyderm created for all the changes you have submitted to the
stagingbranch, with the same ID. While the commits to thestagingbranch are ancestors of the currentHEADinmaster, they were never the actualHEADofmasterthemselves, so they do not get processed. This behavior works for most of the use cases because commits in Pachyderm are generally additive, so processing the HEAD commit also processes data from previous commits.