.. _architecture: Pipeline Architecture ===================== This page provides a detailed architectural overview of the AIND Ephys Pipeline, including deployment modes, infrastructure components, and data flow. Detailed Architecture Diagram ------------------------------ .. only:: html .. note:: **Interactive Diagram:** Use your mouse to zoom (scroll) and pan (click + drag). All hyperlinks are clickable. A fullscreen button (⛶) may appear in the top-right corner when hovering over the diagram. .. mermaid:: flowchart TD %% Deployment paths subgraph code_ocean["🌊 Code Ocean Deployment"] direction TB co_main["pipeline/main.nf
(Nextflow DSL1)
Code Ocean Platform"] co_branches["Branch Selection:
• co_kilosort4 (main)
• co_kilosort25
• co_spykingcircus2
• co_*_opto variants"] co_main -.->|"Branch determines
sorter"| co_branches end subgraph slurm_local["🖥️ SLURM/Local Deployment"] direction TB mb_main["pipeline/main_multi_backend.nf
(Nextflow DSL2)
Multi-backend Support"] subgraph executor["⚙️ Executor"] direction LR slurm_exec["SLURM
Cluster execution"] local_exec["Local
Single machine"] end mb_main -->|"Submitted to"| executor end co_main -->|"Copied from ➜"| mb_main %% Input/Output data input[("📥 Input Data
(Ephys Session)")] output[("📤 Output
NWB files + QC + Viz")] %% Hugging Face models subgraph hf_models["🤗 Hugging Face Models (UnitRefine)"] direction TB noise_model["noise_neural_classifier
Noise vs. neural units"] sua_mua_model["sua_mua_classifier
Single-unit vs. multi-unit"] end %% Container registry subgraph registry["☁️ GitHub Container Registry (ghcr.io)"] direction TB base["aind-ephys-pipeline-base
General processing
(tag: si-0.103.0)"] ks25["aind-ephys-spikesort-kilosort25
Kilosort 2.5 sorter
(tag: si-0.103.0)"] ks4["aind-ephys-spikesort-kilosort4
Kilosort 4 sorter
(tag: si-0.103.0)"] nwb["aind-ephys-pipeline-nwb
NWB export
(tag: si-0.103.0)"] end %% Common pipeline steps subgraph pipeline["📊 Processing Pipeline
(SpikeInterface-based)"] direction TB step1["1. Job Dispatch
aind-ephys-job-dispatch
Generate parallel job JSONs
(per probe/shank)"] step2["2. Preprocessing
aind-ephys-preprocessing
Phase shift • Highpass filter
Denoising • Motion estimation"] step3a["3a. Kilosort2.5
aind-ephys-spikesort-kilosort25"] step3b["3b. Kilosort4
aind-ephys-spikesort-kilosort4
(GPU required)"] step3c["3c. SpykingCircus2
aind-ephys-spikesort-spykingcircus2"] step4["4. Postprocessing
aind-ephys-postprocessing
Amplitudes • Locations • PCA
Correlograms • Quality metrics"] step5["5. Curation
aind-ephys-curation
QC thresholds
UnitRefine classifier"] step6["6. Visualization
aind-ephys-visualization
Timeseries • Drift maps
Figurl sorting summary"] step7["7. Results Collector
aind-ephys-result-collector
Aggregate parallel outputs"] step8["8. Quality Control
aind-ephys-processing-qc
Run QC checks"] step9["9. QC Collector
aind-ephys-qc-collector
Aggregate QC results"] step10["10. NWB Ecephys
aind-ecephys-nwb
Export raw/LFP data"] step11["11. NWB Units
aind-units-nwb
Export spike sorting results"] step1 --> step2 step2 --> step3a & step3b & step3c step3a & step3b & step3c --> step4 step4 --> step5 step5 --> step6 step2 & step3a & step3b & step3c & step4 & step5 & step6 --> step7 step1 & step7 --> step8 step8 --> step9 step1 --> step10 step10 & step7 --> step11 end %% Data flow input -->|"Mounted as
capsule/data/ecephys_session"| step1 step7 & step9 & step11 -->|"Published to
RESULTS_PATH"| output %% HF model usage noise_model -.->|"used by"| step5 sua_mua_model -.->|"used by"| step5 %% Container usage base -.->|"used by"| step1 base -.->|"used by"| step2 base -.->|"used by"| step4 base -.->|"used by"| step5 base -.->|"used by"| step6 base -.->|"used by"| step7 base -.->|"used by"| step8 base -.->|"used by"| step9 base -.->|"used by"| step3c ks25 -.->|"used by"| step3a ks4 -.->|"used by"| step3b nwb -.->|"used by"| step10 nwb -.->|"used by"| step11 co_main -.->|"Executes"| pipeline executor -.->|"Executes"| pipeline %% Version control versions["📋 capsule_versions.env
Pins Git commit hashes
for each step"] pipeline -.->|"Version controlled
via"| versions %% Styling classDef deployment fill:#e1f5ff,stroke:#0066cc,stroke-width:2px classDef pipeline_step fill:#fff4e6,stroke:#ff9800,stroke-width:2px classDef sorter fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px classDef data fill:#e8f5e9,stroke:#4caf50,stroke-width:3px classDef container fill:#fce4ec,stroke:#e91e63,stroke-width:2px classDef ml_model fill:#fff9e6,stroke:#ffc107,stroke-width:2px class co_main,mb_main,co_branches,slurm_exec,local_exec deployment class step1,step2,step4,step5,step6,step7,step8,step9,step10,step11 pipeline_step class step3a,step3b,step3c sorter class input,output data class base,ks25,ks4,nwb container class noise_model,sua_mua_model ml_model Architecture Components ------------------------ Deployment Modes ~~~~~~~~~~~~~~~~ The pipeline supports two deployment strategies: **Code Ocean Deployment** - Uses ``pipeline/main.nf`` (Nextflow DSL1) - Branch-based sorter selection - Separate branches for each configuration: - ``main``/``co_kilosort4``: Kilosort4 - ``co_kilosort25``: Kilosort2.5 - ``co_spykingcircus2``: SpykingCircus2 - Plus ``*_opto`` variants with optogenetics artifact removal **SLURM/Local Deployment** - Uses ``pipeline/main_multi_backend.nf`` (Nextflow DSL2) - Parameter-driven sorter selection - Supports both SLURM clusters and local execution Infrastructure Components ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Container Registry** Four container images from GitHub Container Registry (ghcr.io): - ``aind-ephys-pipeline-base``: Used by steps 1, 2, 4-9 and SpykingCircus2 - ``aind-ephys-spikesort-kilosort25``: Kilosort2.5 sorter - ``aind-ephys-spikesort-kilosort4``: Kilosort4 sorter (requires GPU) - ``aind-ephys-pipeline-nwb``: NWB export steps (10-11) **Machine Learning Models** UnitRefine pretrained classifiers from Hugging Face (used in Step 5 - Curation): - ``UnitRefine_noise_neural_classifier``: Distinguishes noise from neural units - ``UnitRefine_sua_mua_classifier``: Classifies single-unit vs multi-unit activity Data Flow ~~~~~~~~~ **Input**: Electrophysiology session data is mounted into each container at ``capsule/data/ecephys_session`` **Processing**: 11 sequential steps with parallelization at steps 2-6 (per probe/shank) **Output**: Results published to ``RESULTS_PATH`` including: - Collected parallel job results - preprocessing, sorting, postprocessing, curation, visualizations (step 7) - Quality control reports (step 9) - NWB files with raw/LFP data and spike sorting units (steps 10-11) Version Control ~~~~~~~~~~~~~~~ Git commit hashes in ``capsule_versions.env`` pin exact versions of each processing step's repository, ensuring reproducibility across pipeline runs. Pipeline Steps Detailed Breakdown ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Job Dispatch** (`aind-ephys-job-dispatch `_): Generates a list of JSON files to be processed in parallel. Parallelization is performed over multiple probes and multiple shanks (e.g., for NP2-4shank probes). The steps from preprocessing to visualization are run in parallel. 2. **Preprocessing** (`aind-ephys-preprocessing `_): Phase shift, highpass filter, denoising (bad channel removal + common median reference ("cmr") or highpass spatial filter - "destripe"), and motion estimation (optionally correction). 3. **Spike Sorting** - Several spike sorters are available: - `Kilosort2.5 `_ - `Kilosort4 `_ - `SpykingCircus2 `_ 4. **Postprocessing** (`aind-ephys-postprocessing `_): Remove duplicate units, compute amplitudes, spike/unit locations, PCA, correlograms, template similarity, template metrics, and quality metrics. 5. **Curation** (`aind-ephys-curation `_): Based on ISI violation ratio, presence ratio, and amplitude cutoff and pretrained unit classifier (`UnitRefine `_). 6. **Visualization** (`aind-ephys-visualization `_): Timeseries, drift maps, and sorting output in `figurl `_. 7. **Result Collection** (`aind-ephys-result-collector `_): This step collects the output of all parallel jobs and copies the output folders to the results folder. 8. **Quality Control** (`aind-ephys-processing-qc `_): Run quality control checks on the processing results. 9. **QC Collector** (`aind-ephys-qc-collector `_): Aggregate quality control results from parallel jobs. 10. **NWB Ecephys** (`aind-ecephys-nwb `_): Export raw/LFP electrophysiology data to NWB format. 11. **NWB Units** (`aind-units-nwb `_): Export spike sorting results (units) to NWB format. Each file can contain multiple streams (e.g., probes), but only a continuous chunk of data (such as an Open Ephys experiment+recording or an NWB ``ElectricalSeries``). See :doc:`pipeline_steps` for more detailed information about each processing step.