BioImageTools/spark_start
subworkflowStarts Spark processing either by spinning up a cluster or setting up variables so that processing can run locally as individual jobs
spark bigdata infrastructure
Module Information
Inputs
| Name | Type | Description |
|---|---|---|
| ch_meta | tuple | Channel of tuples where the first item is the meta map which contains a "spark_work_dir" field. Structure: [ val(meta), [ files ] ] |
| data_dir | path | Paths to be mounted in the Spark workers for data access |
| spark_cluster | boolean | Whether or not to spin up a Spark cluster |
| spark_workers | integer | Number of workers in the cluster |
| spark_worker_cores | integer | Number of cores per Spark worker |
| spark_gb_per_core | integer | Number of GB of memory per worker core |
| spark_driver_cores | integer | Number of cores for the Spark driver |
| spark_driver_memory | string | Memory specification for the Spark driver |
Outputs
| Name | Type | Description |
|---|---|---|
| spark_context | tuple | The tuple from input ch_meta with the spark_context map appended. Structure: [ val(meta), [ files ], val(spark_context) ] |
Quick Start
Include this subworkflow in your Nextflow pipeline:
include { SPARK_START } from 'https://github.com/BioImageTools/nextflow-modules/tree/main/subworkflows/bits/spark_start'