BioImageTools/spark_start

subworkflow

Starts Spark processing either by spinning up a cluster or setting up variables so that processing can run locally as individual jobs

spark bigdata infrastructure

Module Information

Repository: https://github.com/BioImageTools/nextflow-modules/tree/main/subworkflows/bits/spark_start
Source: BioImageTools
Organization: BioImageTools
Authors: @krokicki , @cgoina

Name	Type	Description
ch_meta	tuple	Channel of tuples where the first item is the meta map which contains a "spark_work_dir" field. Structure: [ val(meta), [ files ] ]
data_dir	path	Paths to be mounted in the Spark workers for data access
spark_cluster	boolean	Whether or not to spin up a Spark cluster
spark_workers	integer	Number of workers in the cluster
spark_worker_cores	integer	Number of cores per Spark worker
spark_gb_per_core	integer	Number of GB of memory per worker core
spark_driver_cores	integer	Number of cores for the Spark driver
spark_driver_memory	string	Memory specification for the Spark driver

Name	Type	Description
spark_context	tuple	The tuple from input ch_meta with the spark_context map appended. Structure: [ val(meta), [ files ], val(spark_context) ]

Include this subworkflow in your Nextflow pipeline:

include { SPARK_START } from 'https://github.com/BioImageTools/nextflow-modules/tree/main/subworkflows/bits/spark_start'