Installation Instructions

Introduction

LuciusOperations uses Viash to convert relatively simple scripts containing REST calls into configurable CLI tools.

The use of Viash and other applications of the tool is outside the scope of this document. Suffice to note that Viash uses a concept of source and what is built from that source. The source of the components can be found under src/, using the bin/build.sh wrapper script, the built scripts are under utils/.

bin/build.sh

>> Building both namespaces
Exporting create_context (processing) =native=> utils/processing
Exporting upload_jar (processing) =native=> utils/processing
Exporting remove_context (processing) =native=> utils/processing
Exporting fetch_jar (processing) =native=> utils/processing
Exporting check (processing) =native=> utils/processing
Exporting process (processing) =native=> utils/processing
Not all configs built successfully
  7/13 configs were disabled
  6/13 configs built successfully
Exporting create_context (api) =native=> utils/api
Exporting upload_jar (api) =native=> utils/api
Exporting initialize (api) =native=> utils/api
Exporting remove_context (api) =native=> utils/api
Exporting fetch_jar (api) =native=> utils/api
Exporting check (api) =native=> utils/api
Not all configs built successfully
  7/13 configs were disabled
  6/13 configs built successfully
Exporting workflow (load) =native=> utils/load
Not all configs built successfully
  12/13 configs were disabled
  1/13 configs built successfully

>> Please see under utils/ to find the tools for api and processing...

This description deals with the installation and configuration of LuciusOperations only.

Where to run

Generic installation

It does not matter where the tools in this toolset are used, as long as wget and curl are available on the device and the DNS endpoints of the Spark Jobserver can be resolved, it should work.

In order to run the processing jobs in api and processing, however, it’s important to collect the appropriate JAR files. Those define the logic (i.e. the code) to actually run on the Spark cluster. Both the api as well as the processing toolset have an upload_jar tool that can be used to upload the appropriate JAR file to the Spark Jobserver.

We will often run the LuciusOperations tools from the SparkJobser instance itself. This allows us to use the (locally) available JAR files that have been used to initialize the API in the first place. Connecting to a Spark Jobserver instance depends on your installation.

Suggested installation

We suggest to run the LuciusOperations tools from an instance running the Lucius backend. Nothing has to be installed in that case, please see here for more information.

Technical details

Every Viash component contains a script.sh file (usually some parameter handling and a command to execute) and a config.vsh.yaml file which contains the configuration for the component.

By running bin/build.sh, this combination of files is transformed into 1 executable script that performs essentially two things:

  1. The resulting executable contains the argument parsing necessary to run it from the CLI
  2. The defaults configured in _viash.yaml are applied.

An illustration of the CLI argument parsing capabilities:

utils/processing/check -h | head
check dev

Arguments:
    --endpoint
        type: string
        default: http://localhost:8090
        The endpoint (URL) to connect to

    --application
        type: string

The format of the configuration in _viash.yaml is derived from the way jq allows to query and update JSON blobs. It should be clear from the example in the repository how to use it:

source: src
target: utils

config_mods: |
  .functionality.version := 'dev'
  .functionality.arguments[.name == '--db'].default := '/Users/toni/Dropbox/_GSK/2022/output-data'
  .functionality.arguments[.name == '--db_version'].default := 0
  .functionality.arguments[.name == '--input'].default := '/Users/toni/Dropbox/_GSK/2022/l1k_l5_subset'
  .functionality.arguments[.name == '--application'].default := 'luciusapi'
  .functionality.arguments[.name == '--geneAnnotations'].default := '/Users/toni/Dropbox/_GSK/2022/l1k_l5_subset/l1k_gene_xref/l1k_gene_xref.parquet'
  .functionality.arguments[.name == '--treatmentAnnotations'].default := '/Users/toni/Dropbox/_GSK/2022/l1k_l5_subset/pert_xref/pert_xref.parquet'
  .functionality.arguments[.name == '--cellAnnotations'].default := '/Users/toni/Dropbox/_GSK/2022/l1k_l5_subset/cell_xref/cell_xref.parquet'

The bin/build.sh script uses Viash to create 2 subdirectories under utils/ containing tools that are used for the processing part and for the api part:

tree utils/
utils/
├── api
│   ├── check
│   ├── create_context
│   ├── fetch_jar
│   ├── initialize
│   ├── remove_context
│   └── upload_jar
├── load
│   └── workflow
├── native
│   └── load
│       └── workflow
│           └── workflow
└── processing
    ├── check
    ├── create_context
    ├── fetch_jar
    ├── process
    ├── remove_context
    └── upload_jar

7 directories, 14 files

Please refer to the usage guide for more information about how to use the tools under utils/.