BioImageTools/spark_start

subworkflow

Starts Spark processing either by spinning up a cluster or setting up variables so that processing can run locally as individual jobs

spark bigdata infrastructure

Module Information

Repository
https://github.com/BioImageTools/nextflow-modules/tree/main/subworkflows/bits/spark_start
Source
BioImageTools
Organization
BioImageTools
Authors
@krokicki , @cgoina

Inputs

Name Type Description
ch_meta tuple Channel of tuples where the first item is the meta map which contains a "spark_work_dir" field. Structure: [ val(meta), [ files ] ]
data_dir path Paths to be mounted in the Spark workers for data access
spark_cluster boolean Whether or not to spin up a Spark cluster
spark_workers integer Number of workers in the cluster
spark_worker_cores integer Number of cores per Spark worker
spark_gb_per_core integer Number of GB of memory per worker core
spark_driver_cores integer Number of cores for the Spark driver
spark_driver_memory string Memory specification for the Spark driver

Outputs

Name Type Description
spark_context tuple The tuple from input ch_meta with the spark_context map appended. Structure: [ val(meta), [ files ], val(spark_context) ]

Quick Start

Include this subworkflow in your Nextflow pipeline:

include { SPARK_START } from 'https://github.com/BioImageTools/nextflow-modules/tree/main/subworkflows/bits/spark_start'
View on GitHub Report Issue