Archived Pachyderm Docs
🔍
2.8.x
Latest
2.8.x
2.7.x
2.6.x
🌙
Get Started
Beginner Tutorial
First-Time Setup
Connect to Existing Instance
Language Clients
Learn
Key Features
Target Audience
Basic Concepts
Intro to Data Versioning
Intro to Pipelines
Intro to Console
Set Cluster Defaults
Set Project Defaults
View Dashboard
View Project
Repo Actions
Create Repo
Upload Files
View Input Files
View Output Files
Delete Repo
Pipeline Actions
Create Pipelines
Duplicate Pipelines
Pause Pipelines
Rerun Pipelines
Update Pipelines
View Pipeline Details
List & Inspect Pipelines
Delete Pipelines
List & Inspect Jobs
Developer Workflow
CI/CD Integration
Create a Machine Learning Workflow
The Push Images Flag
Working with Pipelines
Diagrams
High-Level Architecture Diagram
Glossary
Ancestry Syntax
Branch
Commit
Commit Set
Cron
DAG
Data Parallelism
Datum
Deferred Processing
Distributed Computing
File
Glob Pattern
Global Identifier
History
Input Repository
Job
NLP
Output Repository
Pachyderm Worker
Pipeline
Pipeline Inputs
Pipeline Specification
Project
Provenance
Task Parallelism
User Code
Set Up
Cloud Deploy
AWS Deployment
Azure Deployment
GCP Deployment
Console Setup
Set Up AWS Secret Manager
Local Deploy
Docker Desktop
Minikube
On-Prem Deploy
Pachctl
Pachctl Auto-completion
Unified Deployment
Authentication & IdP Connectors
MockIDP
Auth0
Okta
Authorization (RBAC)
Access Control (RBAC) Roles & Permissions
Manage RBAC via Console
Add Roles to User via PachCTL
Add Roles to Group via PachCTL
Connection
Environment Variables
Kubernetes RBAC
Import a Kubernetes Context
Log Aggregation (Loki)
Non-Default Namespaces
Enterprise Edition
Activate Enterprise via Helm
Activate Enterprise via PachCTL
Features Overview
Enterprise Server (ES)
Activate ES for Multi-Cluster
Activate ES for Single-Cluster
Register a Cluster via Helm
Register a Cluster via PachCTL
Server Management
Server Setup
S3 Gateway API
TLS (SSL, HTTPS)
Tracing (Jaeger)
Global Config
Manage
Helm Chart Values (HCVs)
Deploy Target HCVs
Global HCVs
Console HCVs
Enterprise Server HCVs
ETCD HCVs
Ingress HCVs
Loki HCVs
PachD HCVs
PachW HCVs
Kube Event Tail HCVs
PGBouncer HCVs
PostgreSQL Subchart HCVs
CloudSQL Auth Proxy HCVs
OpenID Connect HCVs
Test Connection HCVs
Proxy HCVs
S3 Gateway
AWS CLI
Boto3
Credentials
MinIO
Unsupported Operations
Backup & Restore
Cluster Backup
Enterprise Server Backup
Upgrade
PachCTL Shell
Check IdP User
Supported Releases & Features
PostgresSQL Fine-Tuning
Cluster Access
Deactivate Authorization
GPUs
Log In via IdP
Revoke User Access
Scaling Limits (CE)
Secrets
Sidecar S3 Gateway
Storage Optimization
Usage Metrics
Monitor with Prometheus
Metrics
Uninstall
👉 Prepare Data
Datum Batching
Defer Processing via Staging Branch
Ingest Data
Mount Volumes
Skip Failed Datums
Time-Windowed Data
Transactions
Build Pipelines & DAGs
Pipeline Specification (PPS)
Autoscaling PPS
Datum Set Spec PPS
Datum Timeout PPS
Datum Tries PPS
Description PPS
Egress PPS
Input Cron PPS
Input Cross PPS
Input Group PPS
Input Join PPS
Input PFS PPS
Input Union PPS
Job Timeout PPS
Metadata PPS
Output Branch PPS
Parallelism Spec PPS
Pod Patch PPS
Pod Spec PPS
Reprocess Spec PPS
Resource Limits PPS
Resource Requests PPS
s3 Out PPS
Scheduling Spec PPS
Service PPS
Sidecar Resource Limits PPS
Sidecar Resource Requests PPS
Spec Commit PPS
Spout PPS
Tolerations PPS
Transform PPS
Full Pipeline Specification
Pipeline Ops
Create a Pipeline
Delete a Pipeline
Draw a Pipeline
Inspect a Pipeline
Jsonnet Pipeline Specifications
Update a Pipeline
View Pipeline Jobs & Runtimes
Project Ops
Create a Project
Set a Project as Current
Add a Project Resource
Grant Project Access
Delete a Project
Branch Ops
Copy Files
Process Specific Commits
Set Branch Triggers
Set Output Branch
Datum Ops
Get Metadata
Inspect Datum
Provenance Ops
List Global Commits & Jobs
List Global ID Sub Commits
Track Downstream
Delete Branch Head
Squash Non-Head Commits
Delete File From History
Tutorials
Standard ML Pipeline
AutoML Pipeline
Multi-Pipeline DAG
Data Parallelism Pipeline
Task Parallelism Pipeline
Docker Image + User Code
Export Data
Egress To An SQL Database
Export via Egress
Export via PachCTL
Mount a Repo Locally
S3 Gateway Operations
Create S3 Bucket
Delete an S3 Object
Delete Empty S3 Bucket
Get an S3 Object
List S3 Buckets
List S3 Objects
Write an S3 Object
Integrate
Determined
Google BigQuery
JupyterLab
Docker Installation Guide
Run in Determined
Local Installation Guide
User Guide
Troubleshooting
Label Studio
Superb AI
VS Code Auto-Completion
Weights and Biases
Run Commands
pachctl
pachctl auth
pachctl auth activate
pachctl auth check
pachctl auth check project
pachctl auth check repo
pachctl auth deactivate
pachctl auth get
pachctl auth get cluster
pachctl auth get enterprise
pachctl auth get project
pachctl auth get repo
pachctl auth get-config
pachctl auth get-groups
pachctl auth get-robot-token
pachctl auth login
pachctl auth logout
pachctl auth revoke
pachctl auth roles-for-permission
pachctl auth rotate-root-token
pachctl auth set
pachctl auth set cluster
pachctl auth set enterprise
pachctl auth set project
pachctl auth set repo
pachctl auth set-config
pachctl auth use-auth-token
pachctl auth whoami
pachctl buildinfo
pachctl completion
pachctl completion bash
pachctl completion zsh
pachctl config
pachctl config delete
pachctl config delete context
pachctl config get
pachctl config get active-context
pachctl config get active-enterprise-context
pachctl config get context
pachctl config get metrics
pachctl config import-kube
pachctl config list
pachctl config list context
pachctl config set
pachctl config set active-context
pachctl config set active-enterprise-context
pachctl config set context
pachctl config set metrics
pachctl config update
pachctl config update context
pachctl connect
pachctl copy
pachctl copy file
pachctl create
pachctl create branch
pachctl create defaults
pachctl create pipeline
pachctl create project
pachctl create repo
pachctl create secret
pachctl debug
pachctl debug analyze
pachctl debug binary
pachctl debug dump
pachctl debug local
pachctl debug log-level
pachctl debug profile
pachctl debug template
pachctl delete
pachctl delete all
pachctl delete branch
pachctl delete commit
pachctl delete defaults
pachctl delete file
pachctl delete job
pachctl delete pipeline
pachctl delete project
pachctl delete repo
pachctl delete secret
pachctl delete transaction
pachctl diff
pachctl diff file
pachctl draw
pachctl draw pipeline
pachctl edit
pachctl edit pipeline
pachctl enterprise
pachctl enterprise deactivate
pachctl enterprise get-state
pachctl enterprise heartbeat
pachctl enterprise pause
pachctl enterprise pause-status
pachctl enterprise register
pachctl enterprise sync-contexts
pachctl enterprise unpause
pachctl exit
pachctl find
pachctl find commit
pachctl finish
pachctl finish commit
pachctl finish transaction
pachctl fsck
pachctl get
pachctl get file
pachctl glob
pachctl glob file
pachctl idp
pachctl idp create-client
pachctl idp create-connector
pachctl idp delete-client
pachctl idp delete-connector
pachctl idp get-client
pachctl idp get-config
pachctl idp get-connector
pachctl idp list-client
pachctl idp list-connector
pachctl idp set-config
pachctl idp update-client
pachctl idp update-connector
pachctl inspect
pachctl inspect branch
pachctl inspect cluster
pachctl inspect commit
pachctl inspect datum
pachctl inspect defaults
pachctl inspect file
pachctl inspect job
pachctl inspect pipeline
pachctl inspect project
pachctl inspect repo
pachctl inspect secret
pachctl inspect transaction
pachctl kube-events
pachctl license
pachctl license activate
pachctl license add-cluster
pachctl license delete-all
pachctl license delete-cluster
pachctl license get-state
pachctl license list-clusters
pachctl license update-cluster
pachctl list
pachctl list branch
pachctl list commit
pachctl list datum
pachctl list file
pachctl list job
pachctl list pipeline
pachctl list project
pachctl list repo
pachctl list secret
pachctl list transaction
pachctl logs
pachctl loki
pachctl mount
pachctl port-forward
pachctl put
pachctl put file
pachctl rerun
pachctl rerun pipeline
pachctl restart
pachctl restart datum
pachctl resume
pachctl resume transaction
pachctl run
pachctl run cron
pachctl run pfs-load-test
pachctl run pps-load-test
pachctl shell
pachctl squash
pachctl squash commit
pachctl start
pachctl start commit
pachctl start pipeline
pachctl start transaction
pachctl stop
pachctl stop job
pachctl stop pipeline
pachctl stop transaction
pachctl subscribe
pachctl subscribe commit
pachctl unmount
pachctl update
pachctl update defaults
pachctl update pipeline
pachctl update project
pachctl update repo
pachctl validate
pachctl validate pipeline
pachctl version
pachctl wait
pachctl wait commit
pachctl wait job
Debug
Common Issues
Debug Pipelines
Troubleshooting Deployments
View Audit Logs
View Kubernetes Logs
Pachyderm SDK
Client Initialization (Start Here)
Method Mapping
First Project
Examples
Breast Cancer Detection
Distributed Image Processing
Spout Pipelines
Reference Docs
Contribute
Coding Conventions
Contributor Setup
Developing on Windows with VSCode
Documentation Style Guide
Home
2.8.x
Prepare Data
Prepare Data
Prepare your data for transformation.