JaneliaSciComp/spark_start

subworkflow

Starts Spark processing either by spinning up a cluster or setting up variables so that processing can run locally as individual jobs

spark bigdata infrastructure

Module Information

Repository: https://github.com/JaneliaSciComp/nextflow-modules/tree/main/subworkflows/janelia/spark_start
Source: Janelia
Organization: JaneliaSciComp
Authors: @krokicki , @cgoina

Name	Type	Description
ch_meta	tuple	Channel of tuples containing a meta map and a list of data paths Structure: [ val(meta), [path(data_paths)] ]
config	map	additional spark configuration
spark_cluster	boolean	Whether or not to spin up a Spark cluster
working_dir	path	Path shared by workers for logging and jar distribution
spark_workers	integer	Number of workers in the cluster
min_workers	integer	Minimum number of spark workers that must be available on the spark cluster
spark_worker_cpus	integer	Number of CPUs per Spark worker
spark_executor_cpus	integer	Number of CPUs for a spark executor
spark_executor_mem_gb	integer	Memory resource in GB allocated for a spark executor
spark_executor_overhead_mem_gb	integer	Memory overhead for a spark executor
spark_driver_cpus	integer	Number of CPUs for the Spark driver
spark_driver_mem_gb	integer	Number of GB of memory for the Spark driver
spark_gb_per_core	integer	Number of GB of memory per worker core

Name	Type	Description
spark_context	tuple	The tuple from input ch_meta with the spark_context map appended. Structure: [ val(meta), val(spark_context) ]

Include this subworkflow in your Nextflow pipeline:

include { SPARK_START } from 'https://github.com/JaneliaSciComp/nextflow-modules/tree/main/subworkflows/janelia/spark_start'