Introduction
In what follows, we will demonstrate how Viash and later Viash Hub allow anyone with a minimal set of technical skills to develop and perform a simple task: Run QC on a (potentially large) set of fastq files and combine all those QC reports into one (multiqc) report.
A bioinformatician could use fastqc
in combination with a (bash
) shell for
loop. This, however, would not be run in parallel. Command-line tools exists to parallelize these tasks, but the ones we know of can hardly be called easy to use.
What if one could just reuse existing functionality (aka Viash components or Nextflow modules) and combine those in a simple Nextflow pipeline in order to achieve the mentioned goal. That’s where this demo project comes in: https://viash-hub.com/data-intuitive/viash_hub_demo/-/tree/v0.1?ref_type=heads.
Below, we run this pipeline on a test dataset in two ways. Screencasts are provided to demonstrate the use.
Test data
We will fetch test data from this repository: https://github.com/hartwigmedical/testdata:
git clone https://github.com/hartwigmedical/testdata testData
Run directly from ViashHub
In order to fetch the workflow from Viash Hub, the following should be added to ~/.nextflow/scm
:
providers {
vsh {
platform = 'gitlab'
server = "viash-hub.com"
}
}
Then, with the data fetched above present under testData
, we can run fastqc in parallel on all 32 fastq files:
nextflow run data-intuitive/viash_hub_demo \
-hub vsh \
-main-script target/nextflow/workflows/parallel_qc/main.nf \
-r main \
--input "testData/**/*.fastq.gz" \
--publish_dir output \
-with-docker
The output will be stored under output
as indicated by the --publish_dir
argument.
Screencast of fetching test data and running the pipeline from Viash Hub directly:
Run from a local copy
First of all, build the workflow component and fetch the dependencies:
❯ viash ns build
temporaryFolder: /tmp/viash_hub_repo5484030342718552259 uri: https://github.com/openpipelines-bio/openpipeline.git
Cloning into '.'...
checkout out: List(git, checkout, tags/0.12.1, --, .) 0
Creating temporary 'target/.build.yaml' file for op as this file seems to be missing.
Exporting parallel_qc (workflows) =nextflow=> <...>/demo/target/nextflow/workflows/parallel_qc
Exporting transpose (utils) =nextflow=> <...>/demo/target/nextflow/utils/transpose
All 2 configs built successfully
Now, run fastqc on all fastq files that can be found under in the testData
directory:
❯ nextflow run target/nextflow/workflows/parallel_qc/main.nf \
--input "testData/**/*.fastq.gz" \
--publish_dir output \
-with-docker
Screencast of fetching test data and running the pipeline from a local copy:
Done!