You may have a local or a network-attached storage that you want your
pipeline to write files to.
You can mount that folder as a volume in Kubernetes
and make it available in your pipeline worker by using the
podPatch pipeline parameter.
The podPatch parameter takes a string that specifies the changes
that you want to add to your existing manifest. To create
a patch, you need to generate a diff of the original ReplicationController
and the one with your changes. You can use one of the online JSON patch
utilities, such as JSON Patch Generator
to create a diff. A diff for mounting a volume might look like this:
[
{
"op": "add",
"path": "/volumes/-",
"value": {
"name": "task-pv-storage",
"persistentVolumeClaim": {
"claimName": "task-pv-claim"
}
}
},
{
"op": "add",
"path": "/containers/0/volumeMounts/-",
"value": {
"mountPath": "/data",
"name": "task-pv-volume"
}
}
]This output needs to be converted into a one-liner and added to the pipeline spec.
We will use the OpenCV example. to demonstrate this functionality.
To mount a volume, complete the following steps:
Create a PersistentVolume and a PersistentVolumeClaim as described in Configure a Pod to Use a PersistentVolume for Storage. Modify
mountPathandpathas needed.For testing purposes, you might want to add an
index.htmlfile as described in Create an index.html file.Get the ReplicationController (RC) manifest from your pipeline:
kubectl get rc <rc-pipeline> -o json > <filename>.yamlExample:
kubectl get rc pipeline-edges-v7 -o json > test-rc.yamlOpen the generated RC manifest for editing.
Under
spec, find thevolumeMountssection.Add your volume in the list of mounts.
Example:
{ "mountPath": "/data", "name": "task-pv-storage" }mountPathis where your volume will be mounted inside of the container.Find the
volumessection.Add the information about the volume.
Example:
{ "name": "task-pv-storage", "persistentVolumeClaim": { "claimName": "task-pv-claim" } }In this section, you need to specify the PersistentVolumeClaim you have created in Step 1.
Save these changes to a new file.
Copy the contents of the original RC to the clipboard.
Go to a JSON patch generator, such as JSON Patch Generator, and paste the contents of the original RC manifest to the Source JSON field.
Copy the contents of the modified RC manifest to clipboard as described above.
Paste the contents of the modified RC manifest to the Target JSON field.
Copy the generated JSON Patch.
Go to your terminal and open the pipeline manifest for editing.
For example, if you are modifying the
edgespipeline, open theedges.jsonfile.Add the patch as a one-liner under the
podPatchparameter.Example:
"podPatch": "[{\"op\": \"add\",\"path\": \"/volumes/-\",\"value\": {\"name\": \"task-pv-storage\",\"persistentVolumeClaim\": {\"claimName\": \"task-pv-claim\"}}}, {\"op\": \"add\",\"path\": \"/containers/0/volumeMounts/-\",\"value\": {\"mountPath\": \"/data\",\"name\": \"task-pv-storage\"}}]"You need to add a backslash () before every quote (") sign that is enclosed in square brackets ([]). Also, you might need to modify the path to
volumeMountsandvolumesby removing the/spec/template/spec/prefix and replacing the assigned volume number with a dash (-). For example, if a path in the JSON patch is/spec/template/spec/volumes/5, you might need to replace it with/volumes/-. See the example above for details.After modifying the pipeline spec, update the pipeline:
pachctl update pipeline -f <pipeline-spec.yaml>A new pod and new replication controller should be created with your modified changes.
Verify that your file was mounted by connecting to your pod and listing the directory that you have specified as a mountpoint. In this example, it is
/data.Example:
kubectl exec -it <pipeline-pod> -- /bin/bashls /dataIf you have added the
index.htmlfile for testing as described in Step 1, you should see that file in the mounted directory.You might want to adjust your pipeline code to read from or write to the mounted directory. For example, in the aforementioned OpenCV example, the code reads from the
/pfs/imagesdirectory and writes to the/pfs/outdirectory. If you want to read or write to the/datadirectory, you need to change those to/data.
Pachyderm has no notion of the files stored in the mounted directory before it is mounted to Pachyderm. Moreover, if you have mounted a network share to which you write files from other than Pachyderm sources, Pachyderm does not guarantee the provenance of those changes.