.. _architecture:
Pipeline Architecture
=====================
This page provides a detailed architectural overview of the AIND Ephys Pipeline, including deployment modes,
infrastructure components, and data flow.
Detailed Architecture Diagram
------------------------------
.. only:: html
.. note::
**Interactive Diagram:** Use your mouse to zoom (scroll) and pan (click + drag). All hyperlinks are clickable. A fullscreen button (⛶) may appear in the top-right corner when hovering over the diagram.
.. mermaid::
flowchart TD
%% Deployment paths
subgraph code_ocean["🌊 Code Ocean Deployment"]
direction TB
co_main["pipeline/main.nf
(Nextflow DSL1)
Code Ocean Platform"]
co_branches["Branch Selection:
• co_kilosort4 (main)
• co_kilosort25
• co_spykingcircus2
• co_*_opto variants"]
co_main -.->|"Branch determines
sorter"| co_branches
end
subgraph slurm_local["🖥️ SLURM/Local Deployment"]
direction TB
mb_main["pipeline/main_multi_backend.nf
(Nextflow DSL2)
Multi-backend Support"]
subgraph executor["⚙️ Executor"]
direction LR
slurm_exec["SLURM
Cluster execution"]
local_exec["Local
Single machine"]
end
mb_main -->|"Submitted to"| executor
end
co_main -->|"Copied from ➜"| mb_main
%% Input/Output data
input[("📥 Input Data
(Ephys Session)")]
output[("📤 Output
NWB files + QC + Viz")]
%% Hugging Face models
subgraph hf_models["🤗 Hugging Face Models (UnitRefine)"]
direction TB
noise_model["noise_neural_classifier
Noise vs. neural units"]
sua_mua_model["sua_mua_classifier
Single-unit vs. multi-unit"]
end
%% Container registry
subgraph registry["☁️ GitHub Container Registry (ghcr.io)"]
direction TB
base["aind-ephys-pipeline-base
General processing
(tag: si-0.103.0)"]
ks25["aind-ephys-spikesort-kilosort25
Kilosort 2.5 sorter
(tag: si-0.103.0)"]
ks4["aind-ephys-spikesort-kilosort4
Kilosort 4 sorter
(tag: si-0.103.0)"]
nwb["aind-ephys-pipeline-nwb
NWB export
(tag: si-0.103.0)"]
end
%% Common pipeline steps
subgraph pipeline["📊 Processing Pipeline
(SpikeInterface-based)"]
direction TB
step1["1. Job Dispatch
aind-ephys-job-dispatch
Generate parallel job JSONs
(per probe/shank)"]
step2["2. Preprocessing
aind-ephys-preprocessing
Phase shift • Highpass filter
Denoising • Motion estimation"]
step3a["3a. Kilosort2.5
aind-ephys-spikesort-kilosort25"]
step3b["3b. Kilosort4
aind-ephys-spikesort-kilosort4
(GPU required)"]
step3c["3c. SpykingCircus2
aind-ephys-spikesort-spykingcircus2"]
step4["4. Postprocessing
aind-ephys-postprocessing
Amplitudes • Locations • PCA
Correlograms • Quality metrics"]
step5["5. Curation
aind-ephys-curation
QC thresholds
UnitRefine classifier"]
step6["6. Visualization
aind-ephys-visualization
Timeseries • Drift maps
Figurl sorting summary"]
step7["7. Results Collector
aind-ephys-result-collector
Aggregate parallel outputs"]
step8["8. Quality Control
aind-ephys-processing-qc
Run QC checks"]
step9["9. QC Collector
aind-ephys-qc-collector
Aggregate QC results"]
step10["10. NWB Ecephys
aind-ecephys-nwb
Export raw/LFP data"]
step11["11. NWB Units
aind-units-nwb
Export spike sorting results"]
step1 --> step2
step2 --> step3a & step3b & step3c
step3a & step3b & step3c --> step4
step4 --> step5
step5 --> step6
step2 & step3a & step3b & step3c & step4 & step5 & step6 --> step7
step1 & step7 --> step8
step8 --> step9
step1 --> step10
step10 & step7 --> step11
end
%% Data flow
input -->|"Mounted as
capsule/data/ecephys_session"| step1
step7 & step9 & step11 -->|"Published to
RESULTS_PATH"| output
%% HF model usage
noise_model -.->|"used by"| step5
sua_mua_model -.->|"used by"| step5
%% Container usage
base -.->|"used by"| step1
base -.->|"used by"| step2
base -.->|"used by"| step4
base -.->|"used by"| step5
base -.->|"used by"| step6
base -.->|"used by"| step7
base -.->|"used by"| step8
base -.->|"used by"| step9
base -.->|"used by"| step3c
ks25 -.->|"used by"| step3a
ks4 -.->|"used by"| step3b
nwb -.->|"used by"| step10
nwb -.->|"used by"| step11
co_main -.->|"Executes"| pipeline
executor -.->|"Executes"| pipeline
%% Version control
versions["📋 capsule_versions.env
Pins Git commit hashes
for each step"]
pipeline -.->|"Version controlled
via"| versions
%% Styling
classDef deployment fill:#e1f5ff,stroke:#0066cc,stroke-width:2px
classDef pipeline_step fill:#fff4e6,stroke:#ff9800,stroke-width:2px
classDef sorter fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
classDef data fill:#e8f5e9,stroke:#4caf50,stroke-width:3px
classDef container fill:#fce4ec,stroke:#e91e63,stroke-width:2px
classDef ml_model fill:#fff9e6,stroke:#ffc107,stroke-width:2px
class co_main,mb_main,co_branches,slurm_exec,local_exec deployment
class step1,step2,step4,step5,step6,step7,step8,step9,step10,step11 pipeline_step
class step3a,step3b,step3c sorter
class input,output data
class base,ks25,ks4,nwb container
class noise_model,sua_mua_model ml_model
Architecture Components
------------------------
Deployment Modes
~~~~~~~~~~~~~~~~
The pipeline supports two deployment strategies:
**Code Ocean Deployment**
- Uses ``pipeline/main.nf`` (Nextflow DSL1)
- Branch-based sorter selection
- Separate branches for each configuration:
- ``main``/``co_kilosort4``: Kilosort4
- ``co_kilosort25``: Kilosort2.5
- ``co_spykingcircus2``: SpykingCircus2
- Plus ``*_opto`` variants with optogenetics artifact removal
**SLURM/Local Deployment**
- Uses ``pipeline/main_multi_backend.nf`` (Nextflow DSL2)
- Parameter-driven sorter selection
- Supports both SLURM clusters and local execution
Infrastructure Components
~~~~~~~~~~~~~~~~~~~~~~~~~~
**Container Registry**
Four container images from GitHub Container Registry (ghcr.io):
- ``aind-ephys-pipeline-base``: Used by steps 1, 2, 4-9 and SpykingCircus2
- ``aind-ephys-spikesort-kilosort25``: Kilosort2.5 sorter
- ``aind-ephys-spikesort-kilosort4``: Kilosort4 sorter (requires GPU)
- ``aind-ephys-pipeline-nwb``: NWB export steps (10-11)
**Machine Learning Models**
UnitRefine pretrained classifiers from Hugging Face (used in Step 5 - Curation):
- ``UnitRefine_noise_neural_classifier``: Distinguishes noise from neural units
- ``UnitRefine_sua_mua_classifier``: Classifies single-unit vs multi-unit activity
Data Flow
~~~~~~~~~
**Input**: Electrophysiology session data is mounted into each container at ``capsule/data/ecephys_session``
**Processing**: 11 sequential steps with parallelization at steps 2-6 (per probe/shank)
**Output**: Results published to ``RESULTS_PATH`` including:
- Collected parallel job results - preprocessing, sorting, postprocessing, curation, visualizations (step 7)
- Quality control reports (step 9)
- NWB files with raw/LFP data and spike sorting units (steps 10-11)
Version Control
~~~~~~~~~~~~~~~
Git commit hashes in ``capsule_versions.env`` pin exact versions of each processing step's repository,
ensuring reproducibility across pipeline runs.
Pipeline Steps Detailed Breakdown
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. **Job Dispatch** (`aind-ephys-job-dispatch `_):
Generates a list of JSON files to be processed in parallel. Parallelization is performed over multiple probes
and multiple shanks (e.g., for NP2-4shank probes). The steps from preprocessing to visualization are run in parallel.
2. **Preprocessing** (`aind-ephys-preprocessing `_):
Phase shift, highpass filter, denoising (bad channel removal + common median reference ("cmr") or highpass
spatial filter - "destripe"), and motion estimation (optionally correction).
3. **Spike Sorting** - Several spike sorters are available:
- `Kilosort2.5 `_
- `Kilosort4 `_
- `SpykingCircus2 `_
4. **Postprocessing** (`aind-ephys-postprocessing `_):
Remove duplicate units, compute amplitudes, spike/unit locations, PCA, correlograms, template similarity,
template metrics, and quality metrics.
5. **Curation** (`aind-ephys-curation `_):
Based on ISI violation ratio, presence ratio, and amplitude cutoff and pretrained unit classifier
(`UnitRefine `_).
6. **Visualization** (`aind-ephys-visualization `_):
Timeseries, drift maps, and sorting output in `figurl `_.
7. **Result Collection** (`aind-ephys-result-collector `_):
This step collects the output of all parallel jobs and copies the output folders to the results folder.
8. **Quality Control** (`aind-ephys-processing-qc `_):
Run quality control checks on the processing results.
9. **QC Collector** (`aind-ephys-qc-collector `_):
Aggregate quality control results from parallel jobs.
10. **NWB Ecephys** (`aind-ecephys-nwb `_):
Export raw/LFP electrophysiology data to NWB format.
11. **NWB Units** (`aind-units-nwb `_):
Export spike sorting results (units) to NWB format.
Each file can contain multiple streams (e.g., probes), but only a continuous chunk of data (such as an
Open Ephys experiment+recording or an NWB ``ElectricalSeries``).
See :doc:`pipeline_steps` for more detailed information about each processing step.