Nextflow is a leading workflow manager in computational biology, known for its portability, scalability, and reproducibility. However, developing workflows can be challenging due to a high barrier to entry and labour-intensive processes. When combined with Viash, it streamlines development, enhances reusability, and simplifies the creation of state-of-the-art workflows.
Key advantages of Nextflow
In computational biology, Nextflow is one of the most popular workflow managers, and with good reason:
Portability: Once a Nextflow workflow has been implemented, it is extremely easy to set up a new system to run that workflow. This system can be of one of many different platforms (i.e. what Nextflow calls executors), including laptops, desktops, High-Performance Compute (HPC) clusters and dynamic cloud infrastructure.
Scalability: When the workflow needs to be run on large dataset cohorts, the computations can be scaled out to multiple compute nodes automatically (depending on the chosen executor). Theoretically this allows reducing the execution time from 1 month of execution time on a single system, to 43 minutes on a compute cluster with 1000 nodes.
Reproducibility: Being able to run a computational workflow multiple times with the same inputs and obtain the same or similar results remains a major challenge in computational biology. This is even more the case when the reruns are months or years apart. Nextflow allows easily integrating with containerisation technologies such as Docker, Podman and Singularity, which ensures that full reproducibility at all times.
Drawbacks of Nextflow
Unfortunately, developing a Nextflow pipeline is not as easy as using one:
High barrier to entry: Not all of the desired properties listed above are not actually ‘turned on’ by default. Creating a Nextflow workflow that is on par with the current state of the art actually requires a significant amount of domain knowledge, thereby limiting collaborations between developers with varying backgrounds and pipeline development skill levels.
Labour intensive: Developing a new workflow can be quite labour intensive, as the necessary code is very verbose and contains a lot of boilerplate code.
Advantages of Viash+Nextflow = VDSL3
Luckily, Viash can help you wrap your code into a state-of-the-art Nextflow script called a VDSL3 module, thereby solving the aforementioned drawbacks. Viash provides many other benefits not related to Nextflow pipeline development, such as:
Reusability: Viash components can not only be used as a step in a Nextflow pipeline, but also as a standalone command-line utility.
Test-driven development: Viash encourages test-driven development where unit tests are written before new functionality is developed.
Separation of concerns: Developers do not need any knowledge of Nextflow to start creating Viash components. Somebody else can then write a Nextflow workflow consisting of the VDSL3 modules generated from the Viash components.
Continuous testing: Viash offers helper scripts to automatically run component tests whenever commits are pushed by one of the developers. This allows catching bugs earlier, thereby preventing costly long-term bugs.