Pipeline Parameters¶

Global Parameters¶

The pipeline accepts several global parameters that control its overall behavior:

--n_jobs N_JOBS                 Number of parallel jobs (for local deployment)
--runmode {full,fast}          Processing mode ('fast' skips some steps like motion correction)
--sorter {kilosort25,kilosort4,spykingcircus2}   Spike sorter selection

Parameter File¶

A parameter file can be used to set all parameters at once. This is the recommended way to configure the pipeline, especially for complex setups. The parameter file should be in JSON format and you can use the pipeline/default_params.json file as a template.

To use a parameter file, specify it with the --params_file option:

--params_file PATH_TO_PARAMS_FILE
# Example: --params_file pipeline/default_params.json

Note

In the spikesorting section of the parameter file, you can specify the sorter and its parameters. The sorter field, if specified and not null, will override the command line --sorter parameter.

Parameter Editor Webapp¶

A browser-based parameter editor is included in params_app/. It reads the JSON schema (pipeline/default_params_schema.json) and renders an interactive form for creating and editing parameter files, with built-in validation.

To run the webapp, use the included launcher script (requires Python 3):

python params_app/serve.py

This starts a local server from the repository root, prints the URL, and opens it in your browser. An optional port argument is supported:

python params_app/serve.py 9000

The webapp provides two tabs:

Editor — an interactive form with all pipeline parameters, inline descriptions, enum dropdowns, nullable toggles, and collapsible sections. You can generate, download, copy, or import JSON files.
Validate JSON — paste or upload an existing JSON file to validate it against the schema. Errors are shown with their JSON path and message.

No installation or build step is required — the app is fully static.

JSON Schema¶

The file pipeline/default_params_schema.json is a JSON Schema (draft-07) that formally describes every parameter, its type, allowed values, and defaults. You can use it for:

Editor integration — VS Code, PyCharm, and other editors can provide autocompletion and inline validation when you add a $schema reference at the top of your params file:
```
{
    "$schema": "./default_params_schema.json",
    "job_dispatch": { "input": "nwb" }
}
```

Programmatic validation — validate parameter files in Python:

import json, jsonschema

with open("pipeline/default_params_schema.json") as f:
    schema = json.load(f)
with open("my_params.json") as f:
    params = json.load(f)

jsonschema.validate(params, schema)  # raises on error

Process-Specific Arguments¶

Parameters can be specified via the parameter file or passed directly as command line arguments when running the pipeline. CLI arguments will override any conflicting parameters set in the parameter file.

Each pipeline step can be configured with specific parameters using the format:

--{step_name}_args="{args}"

Job Dispatch Parameters¶

Parameter file section (job_dispatch):

{
    "split_segments": true,
    "split_groups": true,
    "debug": false,
    "debug_duration": 30,
    "skip_timestamps_check": false,
    "multi_session": false,
    "input": "openephys",
    "spikeinterface_info": null
}

split_segments

If true, each recording segment is processed independently. If false, all segments are concatenated before processing.

split_groups

If true, different electrode groups (e.g., probes) are dispatched as separate parallel jobs. If false, all groups are combined into a single job.

debug

Enable debug mode: the recording is clipped to debug_duration seconds to allow rapid end-to-end testing of the pipeline.

debug_duration

Duration in seconds to which the recording is clipped when debug is true.

skip_timestamps_check

Skip validation of sample timestamps. Useful when timestamps are absent or known to be unreliable (e.g. some Open Ephys recordings).

multi_session

If true, the data folder is expected to contain multiple session sub-folders, each of which is processed independently.

input

Data loader (reader) to use. One of aind, spikeglx, openephys, nwb, or spikeinterface. Use spikeinterface together with --spikeinterface-info for any format supported by SpikeInterface.

spikeinterface_info

JSON string containing the information needed to load a recording with SpikeInterface when input is set to spikeinterface. It includes:

reader_type (required): string with the reader type (e.g. ‘plexon’, ‘neuralynx’, ‘intan’ etc.).
Use ‘spikeinterface’ for any format supported by SpikeInterface’s universal reader.
reader_kwargs (optional): dictionary with the reader kwargs (e.g. {‘folder’: ‘/path/to/folder’}).
keep_stream_substrings (optional): string or list of strings with the stream names to load (e.g. ‘AP’ or [‘AP’, ‘LFP’]).
skip_stream_substrings (optional): string (or list of strings) with substrings used to skip streams (e.g. ‘NIDQ’ or [‘USB’, ‘EVENTS’]).
probe_paths (optional): string or dict the probe paths to a ProbeInterface JSON file (e.g. ‘/path/to/probe.json’).
If a dict is provided, the key is the stream name and the value is the probe path. If reader_kwargs is not provided, the reader will be created with default parameters. The probe_path is required if the reader doesn’t load the probe automatically.

{
    "reader_type": "intan",
    "reader_kwargs": {
        "file_path": "/path/to/intan.rhd"
    },
    "skip_stream_substrings": ["EVENTS"],
    "probe_paths": "path/to/probe.json"
}

Note

If the reader needs extra packages installed, specify them in the EXTRA_INSTALLS variable in the capsule_versions.env file (e.g. EXTRA_INSTALLS="mtscomp").

Preprocessing Parameters¶

Parameter file section (``preprocessing``):

{
    "job_kwargs": {
        "chunk_duration": "1s",
        "progress_bar": false
    },
    "min_preprocessing_duration": 120,
    "custom_preprocessing_pipeline": null,
    "denoising_strategy": "cmr",
    "filter_type": "highpass",
    "highpass_filter": {
        "freq_min": 300.0,
        "margin_ms": 5.0
    },
    "bandpass_filter": {
        "freq_min": 300.0,
        "freq_max": 6000.0,
        "margin_ms": 5.0
    },
    "phase_shift": {
        "margin_ms": 100.0
    },
    "detect_bad_channels": {
        "method": "coherence+psd",
        "dead_channel_threshold": -0.5,
        "noisy_channel_threshold": 1.0,
        "outside_channel_threshold": -0.3,
        "outside_channels_location": "top",
        "n_neighbors": 11,
        "seed": 0
    },
    "remove_out_channels": true,
    "remove_bad_channels": true,
    "max_bad_channel_fraction": 0.5,
    "common_reference": {
        "reference": "global",
        "operator": "median"
    },
    "highpass_spatial_filter": {
        "n_channel_pad": 60,
        "n_channel_taper": null,
        "direction": "y",
        "apply_agc": true,
        "agc_window_length_s": 0.01,
        "highpass_butter_order": 3,
        "highpass_butter_wn": 0.01
    },
    "motion_correction": {
        "compute": true,
        "apply": false,
        "preset": "dredge_fast",
        "detect_kwargs": {},
        "select_kwargs": {},
        "localize_peaks_kwargs": {},
        "estimate_motion_kwargs": {
            "win_step_norm": 0.1,
            "win_scale_norm": 0.1
        },
        "interpolate_motion_kwargs": {}
    }
}

job_kwargs.chunk_duration

Size of each processing chunk, e.g. "1s". Larger chunks reduce overhead but require more memory.

job_kwargs.progress_bar

Show a progress bar during chunk-based processing.

min_preprocessing_duration

Minimum recording duration in seconds required to run preprocessing. Recordings shorter than this value are skipped entirely.

custom_preprocessing_pipeline

A dictionary defining a fully custom preprocessing pipeline. When null, the default pipeline (filter → phase-shift → bad-channel detection → CMR/destripe → motion) is used. See [SpikeInterface docs](https://spikeinterface.readthedocs.io/en/stable/how_to/build_pipeline_with_dicts.html)

denoising_strategy

Strategy used for channel-level denoising after filtering:

"cmr" — Common Median Reference: subtracts the median trace computed across all (good) channels.
"destripe" — IBL destriping: applies a high-pass spatial filter along the probe axis (parameters controlled by highpass_spatial_filter).

filter_type

Temporal filter applied to the raw signal before phase-shift correction:

"highpass" — uses highpass_filter settings.
"bandpass" — uses bandpass_filter settings.

highpass_filter.freq_min

High-pass cutoff frequency in Hz.

highpass_filter.margin_ms

Margin in milliseconds added at segment boundaries to reduce filter edge artifacts.

bandpass_filter.freq_min / bandpass_filter.freq_max

Lower and upper cutoff frequencies in Hz for the bandpass filter (only used when filter_type is "bandpass").

bandpass_filter.margin_ms

Boundary margin in ms for the bandpass filter.

phase_shift.margin_ms

Margin in ms used for inter-sample phase-shift correction. This step compensates for the time offset introduced by multiplexed ADCs (e.g. Neuropixels).

detect_bad_channels.method

Algorithm used to classify bad channels. "coherence+psd" combines local signal coherence with power-spectral density to identify dead, noisy, and out-of-brain channels.

detect_bad_channels.dead_channel_threshold

Coherence threshold below which a channel is classified as dead/disconnected.

detect_bad_channels.noisy_channel_threshold

SNR threshold above which a channel is classified as excessively noisy.

detect_bad_channels.outside_channel_threshold

Threshold used to detect channels outside the brain (based on PSD features).

detect_bad_channels.outside_channels_location

Expected anatomical position of out-of-brain channels on the probe: "top" (channels at the tip end) or "bottom" (channels at the base end).

detect_bad_channels.n_neighbors

Number of neighboring channels used when computing local signal coherence.

detect_bad_channels.seed

Random seed for reproducibility of the bad channel detection algorithm.

remove_out_channels

If true, channels detected as outside the brain are removed from further processing.

remove_bad_channels

If true, dead and noisy channels are removed from further processing.

max_bad_channel_fraction

Maximum fraction of total channels that may be classified as bad before the entire recording is skipped. For example, 0.5 means preprocessing is aborted if more than half of the channels are bad.

common_reference.reference

Scope of the common reference calculation. "global" uses all good channels on the probe.

common_reference.operator

Aggregation function for the common reference. "median" is robust to outlier channels.

highpass_spatial_filter

Parameters for IBL destriping. Only used when denoising_strategy is "destripe".

n_channel_pad — number of channels padded at each edge before spatial filtering.
n_channel_taper — number of channels used for the cosine taper (null = auto).
direction — axis along which to apply the spatial filter ("y" = depth axis).
apply_agc — apply Automatic Gain Control before spatial filtering.
agc_window_length_s — AGC window length in seconds.
highpass_butter_order — order of the Butterworth spatial high-pass filter.
highpass_butter_wn — normalised cutoff frequency of the spatial filter (0–1).

motion_correction.compute

If true, estimate probe drift and save the motion object. The motion estimate is always saved to results even if it is not applied to the recording.

motion_correction.apply

If true, apply motion interpolation to the recording traces. If false (default), motion is computed and saved but the raw traces are left unmodified; postprocessing can optionally apply it later.

motion_correction.preset

Named preset controlling the full motion-estimation workflow (detection, localisation, estimation).

Available motion presets:

dredge
dredge_fast (default)
nonrigid_accurate
nonrigid_fast_and_accurate
rigid_fast
kilosort_like

motion_correction.estimate_motion_kwargs

Extra keyword arguments forwarded to the motion estimator. win_step_norm and win_scale_norm control the temporal and spatial window step/scale (normalised to the recording duration and probe length, respectively).

motion_correction.detect_kwargs / select_kwargs / localize_peaks_kwargs / interpolate_motion_kwargs

Additional keyword arguments forwarded to the peak detection, peak selection, peak localisation, and motion interpolation steps, respectively. Leave empty ({}) to use preset defaults.

Spike Sorting Parameters¶

Parameter file section (spikesorting):

{
    "sorter": null,
    "{sorter_name}": {
          "job_kwargs": {
             "chunk_duration": "1s",
             "progress_bar": false
          },
          "skip_motion_correction": false,
          "min_drift_channels": 6,
          "raise_if_fails": true,
          "clear_cache": false,
          "sorter": {
             // sorter-specific parameters forwarded to SpikeInterface
          }
       }
}

Note

The kilosort4, kilosort25, and spykingcircus2 sub-objects inside spikesorting hold sorter-specific parameters and are documented separately in each sorter separately.

sorter: Selects the spike sorter to use. Accepted values: "kilosort4", "kilosort25", "spykingcircus2". When null, the sorter is determined by the --sorter CLI argument.
{sorter}.job_kwargs: Parallel processing chunk settings for the spike sorting step (same format as other steps).
{sorter}.skip_motion_correction: If true, disables the sorter’s built-in motion correction (useful when motion has already been handled in preprocessing).
{sorter}.min_drift_channels: Minimum number of channels required to activate the sorter’s internal motion correction. Recordings with fewer channels skip drift correction automatically.
{sorter}.raise_if_fails: If true, a sorting failure raises an exception and stops the pipeline for that recording. If false, the failure is logged and the pipeline continues with the remaining recordings.
{sorter}.clear_cache: (Kilosort4 only) Force PyTorch to release its memory cache between memory-intensive operations. Useful on GPUs with limited VRAM.
{sorter}.sorter: Dictionary of sorter-specific parameters forwarded directly to the SpikeInterface sorter wrapper (e.g. batch_size, Th_universal for Kilosort4). Refer to the SpikeInterface documentation for the full list of accepted parameters per sorter.

Postprocessing Parameters¶

Parameter file section (postprocessing):

{
    "job_kwargs": {
        "chunk_duration": "1s",
        "progress_bar": false
    },
    "use_motion_corrected": false,
    "sparsity": {
        "method": "radius",
        "radius_um": 100
    },
    "duplicate_threshold": 0.9,
    "return_in_uV": true,
    "extensions": {
        "random_spikes": {
            "max_spikes_per_unit": 500,
            "method": "uniform",
            "margin_size": null,
            "seed": null
        },
        "noise_levels": {
            "num_chunks_per_segment": 20,
            "chunk_size": 10000,
            "seed": null
        },
        "waveforms": {
            "ms_before": 2.0,
            "ms_after": 3.0,
            "dtype": null
        },
        "templates": {},
        "spike_amplitudes": {
            "peak_sign": "neg"
        },
        "template_similarity": {
            "method": "l1"
        },
        "correlograms": {
            "window_ms": 50.0,
            "bin_ms": 1.0
        },
        "isi_histograms": {
            "window_ms": 100.0,
            "bin_ms": 5.0
        },
        "unit_locations": {
            "method": "monopolar_triangulation"
        },
        "spike_locations": {
            "method": "grid_convolution"
        },
        "template_metrics": {
            "upsampling_factor": 10,
            "sparsity": null,
            "include_multi_channel_metrics": true
        },
        "principal_components": {
            "n_components": 5,
            "mode": "by_channel_local",
            "whiten": true
        },
        "quality_metrics": {
            "metric_names": [
                "num_spikes", "firing_rate", "presence_ratio",
                "snr", "isi_violation", "rp_violation",
                "sliding_rp_violation", "amplitude_cutoff",
                "amplitude_median", "amplitude_cv",
                "synchrony", "firing_range", "drift",
                "isolation_distance", "l_ratio", "d_prime",
                "nearest_neighbor", "silhouette"
            ],
            "metric_params": {
                "presence_ratio": { "bin_duration_s": 60 },
                "snr": { "peak_sign": "neg", "peak_mode": "extremum" },
                "isi_violation": { "isi_threshold_ms": 1.5, "min_isi_ms": 0 },
                "rp_violation": { "refractory_period_ms": 1, "censored_period_ms": 0.0 },
                "sliding_rp_violation": {
                    "bin_size_ms": 0.25, "window_size_s": 1,
                    "exclude_ref_period_below_ms": 0.5, "max_ref_period_ms": 10,
                    "contamination_values": null
                },
                "amplitude_cutoff": {
                    "peak_sign": "neg", "num_histogram_bins": 100,
                    "histogram_smoothing_value": 3, "amplitudes_bins_min_ratio": 5
                },
                "amplitude_median": { "peak_sign": "neg" },
                "amplitude_cv": {
                    "average_num_spikes_per_bin": 50, "percentiles": [5, 95],
                    "min_num_bins": 10, "amplitude_extension": "spike_amplitudes"
                },
                "firing_range": { "bin_size_s": 5, "percentiles": [5, 95] },
                "synchrony": { "synchrony_sizes": [2, 4, 8] },
                "nearest_neighbor": { "max_spikes": 10000, "n_neighbors": 4 },
                "silhouette": { "method": ["simplified"] }
            }
        }
    }
}

job_kwargs: Parallel processing chunk settings (same format as other steps).
use_motion_corrected: If true and motion was estimated but not applied during preprocessing, motion interpolation is applied to the recording before computing postprocessing extensions. Has no effect if motion correction was already applied or was not computed.
sparsity.method: Strategy for selecting the subset of channels associated with each unit. "radius" retains all channels within radius_um µm of the estimated unit location.
sparsity.radius_um: Radius in micrometres around each unit’s estimated location used for sparse channel selection.
duplicate_threshold: Template correlation threshold above which two units are considered duplicates. The unit with fewer spikes is removed to avoid counting the same neuron twice.
return_in_uV: If true, waveforms and templates are returned in microvolts (µV) by applying the recording’s gain/offset. If false, values remain in raw ADC counts.
extensions: Parameters for the SpikeInterface extensions. Check spikeinterface documentation for the full list of available extensions and their parameters.

Curation Parameters¶

Parameter file section (curation):

{
    "job_kwargs": {
        "chunk_duration": "1s",
        "progress_bar": false
    },
    "query": "isi_violations_ratio < 0.5 and presence_ratio > 0.8 and amplitude_cutoff < 0.1",
    "noise_neural_classifier": "SpikeInterface/UnitRefine_noise_neural_classifier",
    "sua_mua_classifier": "SpikeInterface/UnitRefine_sua_mua_classifier"
}

job_kwargs

Parallel processing chunk settings (same format as other steps).

query

A pandas-style query string applied to the quality metrics table. Units that do not satisfy the condition are labelled as "bad" (they are retained in the output but flagged). Example:

"isi_violations_ratio < 0.5 and presence_ratio > 0.8 and amplitude_cutoff < 0.1"

Any quality metric column name can be used in the expression. Set to null or "" to skip query-based curation.

noise_neural_classifier

HuggingFace model ID for the noise-vs-neural unit classifier (part of the UnitRefine suite). The model takes waveform features as input and predicts whether each unit represents a real neuron or recording noise.

sua_mua_classifier

HuggingFace model ID for the single-unit (SUA) vs. multi-unit (MUA) classifier. Predicts whether a unit is a well-isolated single neuron or a mixture of multiple neurons.

NWB Ecephys Parameters¶

Parameter file section (nwb.ecephys):

{
    "backend": "zarr",
    "stub": false,
    "stub_seconds": 10,
    "write_lfp": true,
    "write_raw": false,
    "lfp_temporal_factor": 2,
    "lfp_spatial_factor": 4,
    "lfp_highpass_freq_min": 0.1,
    "surface_channel_agar_probes_indices": "",
    "lfp": {
        "filter": {
            "freq_min": 0.1,
            "freq_max": 500
        },
        "sampling_rate": 2500
    }
}

backend: NWB file format. "zarr" produces a chunked, cloud-friendly Zarr store; "hdf5" produces a standard HDF5 .nwb file.
stub: If true, write a truncated version of the file for quick validation and testing.
stub_seconds: Duration in seconds of the stub recording written when stub is true.
write_lfp: If true, include the LFP ElectricalSeries in the NWB file.
write_raw: If true, include the raw (unfiltered, full-bandwidth) ElectricalSeries in the NWB file. Note: this significantly increases output file size.
lfp_temporal_factor: Temporal downsampling factor applied to the LFP band before writing. A value of 2 halves the sample rate. Use 0 or 1 to keep all samples.
lfp_spatial_factor: Channel subsampling stride for the LFP band. A value of 4 retains every 4th channel. Use 0 or 1 to retain all channels.
lfp_highpass_freq_min: High-pass cutoff frequency in Hz applied to the LFP band before writing. Use 0 to skip this filter.
surface_channel_agar_probes_indices: JSON string mapping probe names to the index of the most superficial channel still in tissue, used for common-median referencing on probes inserted through agar. Example: {"ProbeA": 350, "ProbeB": 360}. Leave empty ("") when not applicable.
lfp.filter.freq_min / lfp.filter.freq_max: Bandpass filter bounds (Hz) that define the LFP frequency band applied before downsampling.
lfp.sampling_rate: Target sampling rate in Hz for the LFP band after temporal downsampling.

Visualization Parameters¶

Parameter file section (visualization):

{
    "job_kwargs": {
        "chunk_duration": "1s",
        "progress_bar": false
    },
    "timeseries": {
        "n_snippets_per_segment": 2,
        "snippet_duration_s": 0.5
    },
    "drift": {
        "detection": {
            "peak_sign": "neg",
            "detect_threshold": 5,
            "exclude_sweep_ms": 0.1
        },
        "localization": {
            "ms_before": 0.1,
            "ms_after": 0.3,
            "radius_um": 100.0
        },
        "n_skip": 30,
        "alpha": 0.15,
        "vmin": -200,
        "vmax": 0,
        "cmap": "Greys_r",
        "figsize": [10, 10]
    },
    "motion": {
        "cmap": "Greys_r",
        "scatter_decimate": 15,
        "figsize": [15, 10]
    }
}

job_kwargs: Parallel processing chunk settings (same format as other steps).
timeseries.n_snippets_per_segment: Number of raw/preprocessed time-series snippet plots generated per recording segment.
timeseries.snippet_duration_s: Duration in seconds of each time-series snippet.
drift.detection.peak_sign: Polarity of peaks detected for the drift scatter plot ("neg" = negative peaks).
drift.detection.detect_threshold: Detection threshold in median-absolute-deviation (MAD) units. Only peaks above this threshold are included in the drift plot.
drift.detection.exclude_sweep_ms: Exclusion window in milliseconds around each detected peak to suppress double-detections.
drift.localization.ms_before / drift.localization.ms_after: Waveform window (ms) around each detected peak used for spike localisation in the drift plot.
drift.localization.radius_um: Radius in µm around each peak’s primary channel used when localising spikes for the drift scatter plot.
drift.n_skip: Decimation factor for the drift scatter plot. A value of 30 means only 1 in 30 detected spikes is plotted (to reduce render time on dense recordings).
drift.alpha: Transparency (alpha) of scatter points in the drift plot (0 = fully transparent, 1 = opaque).
drift.vmin / drift.vmax: Colour-axis limits for the drift colourmap (typically amplitude in µV).
drift.cmap: Matplotlib colourmap used for colouring drift scatter points.
drift.figsize: Figure size [width, height] in inches for the drift plot.
motion.cmap: Matplotlib colourmap used for the motion summary plot.
motion.scatter_decimate: Decimation factor for the motion scatter plot (same concept as drift.n_skip).
motion.figsize: Figure size [width, height] in inches for the motion plot.

Full example with custom parameters¶

Here’s an example of running the pipeline with custom parameters:

DATA_PATH=$DATA RESULTS_PATH=$RESULTS \
nextflow -C nextflow_local.config run main_multi_backend.nf \
  --n_jobs 16 \
  --sorter kilosort4 \
  --job_dispatch_args="--input spikeglx --debug --debug-duration 120" \
  --preprocessing_args="--motion compute --motion-preset nonrigid_fast_and_accurate" \
  --nwb_ecephys_args="--skip-lfp"

This example:

Runs 16 parallel jobs
Uses Kilosort4 for spike sorting
Processes SpikeGLX data in debug mode
Computes nonrigid motion correction
Skips LFP export in NWB files