Writing Workflows
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How do I connect tools together into a workflow?
Objectives
Learn how to construct workflows from multiple CWL tool descriptions.
This workflow extracts a java source file from a tar file and then compiles it.
1st-workflow.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
Use a JSON object in a separate file to describe the input of a run:
1st-workflow-job.yml
inp:
class: File
path: hello.tar
ex: Hello.java
Now invoke cwl-runner
with the tool wrapper and the input object on the
command line:
$ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java
$ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml
[job untar] /tmp/tmp94qFiM$ tar xf /home/example/hello.tar Hello.java
[step untar] completion status is success
[job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java
[step compile] completion status is success
[workflow 1st-workflow.cwl] outdir is /home/example
Final process status is success
{
"classout": {
"location": "/home/example/Hello.class",
"checksum": "sha1$e68df795c0686e9aa1a1195536bd900f5f417b18",
"class": "File",
"size": 416
}
}
What’s going on here? Let’s break it down:
cwlVersion: v1.0
class: Workflow
The cwlVersion
field indicates the version of the CWL spec used by the
document. The class
field indicates this document describes a workflow.
inputs:
inp: File
ex: string
The inputs
section describes the inputs of the workflow. This is a
list of input parameters where each parameter consists of an identifier
and a data type. These parameters can be used as sources for input to
specific workflows steps.
outputs:
classout:
type: File
outputSource: compile/classfile
The outputs
section describes the outputs of the workflow. This is a
list of output parameters where each parameter consists of an identifier
and a data type. The outputSource
connects the output parameter classfile
of the compile
step to the workflow output parameter classout
.
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
outputs: [example_out]
The steps
section describes the actual steps of the workflow. In this
example, the first step extracts a file from a tar file, and the second
step compiles the file from the first step using the java compiler.
Workflow steps are not necessarily run in the order they are listed,
instead the order is determined by the dependencies between steps (using
source
). In addition, workflow steps which do not depend on one
another may run in parallel.
The first step, untar
runs tar-param.cwl
(described previously in
Parameter references). This tool has two input parameters, tarfile
and extractfile
and one output parameter example_out
.
The inputs
section of the workflow step connects these two input parameters to
the inputs of the workflow, inp
and ex
using source
. This means that when
the workflow step is executed, the values assigned to inp
and ex
will be
used for the parameters tarfile
and extractfile
in order to run the tool.
The outputs
section of the workflow step lists the output parameters that are
expected from the tool.
compile:
run: arguments.cwl
in:
src: untar/example_out
outputs: [classfile]
The second step compile
depends on the results from the first step by
connecting the input parameter src
to the output parameter of untar
using
untar/example_out
. The output of this step classfile
is connected to the
outputs
section for the Workflow, described above.
Key Points
Each step in a workflow must have its own CWL description.
Top level inputs and outputs of the workflow are described in the
inputs
andoutputs
fields respectively.The steps are specified under
steps
.Execution order is determined by the flow of inputs and outputs between steps.