JaneliaSciComp/dask_start
subworkflowCreates a Dask cluster and waits for it to be ready
dask infrastructure distributed
Module Information
Inputs
| Name | Type | Description |
|---|---|---|
| meta_and_files | tuple | Channel containing metadata and files that need to be accessible by the cluster. Structure: [ val(meta), path(files)... ] |
| distributed | boolean | If true, create a distributed Dask cluster; if false, return empty context |
| dask_config | file | Path to Dask configuration file (optional) |
| dask_work_path | directory | Path to Dask work directory where cluster files will be stored (optional) |
| total_workers | integer | Number of total workers to start in the cluster |
| required_workers | integer | Minimum number of workers required before the cluster is considered ready |
| dask_worker_cpus | integer | Number of CPU cores allocated per worker |
| dask_worker_mem_gb | integer | Memory in GB allocated per worker |
Outputs
| Name | Type | Description |
|---|---|---|
| dask_context | tuple | Dask cluster context information. If distributed=true, contains cluster details; if distributed=false, contains empty map. Structure: [ val(meta), val(dask_info) ] Where dask_info is a map containing: - scheduler_address: Address of the Dask scheduler - cluster_work_dir: Path to the cluster work directory - available_workers: Number of workers that joined the cluster |
Quick Start
Include this subworkflow in your Nextflow pipeline:
include { DASK_START } from 'https://github.com/JaneliaSciComp/nextflow-modules/tree/main/subworkflows/janelia/dask_start'