Below are a set of recommended good practices to keep in mind when writing a Common Workflow Language description for a tool or workflow. These guidelines are presented for consideration on a scale of usefulness: more is better, not all are required.
☐ No type: string
parameters for names of input or reference files/directories; use type: File
or type: Directory
as appropriate.
☐ Include a license that allows for re-use by anyone, e.g. Apache 2.0. If possible, the license should be specified with its corresponding [SPDX identifier][spdx]. Construct the metadata field for the licence by providing a URL of the form https://spdx.org/licenses/[SPDX-ID]
where SPDX-ID
is the taken from the list of identifiers linked above. See the example snippet below for guidance. For non-standard licenses without an SPDX identifier, provide a URL to the license.
Example of metadata field for license with SPDX identifier:
$namespaces:
s: http://schema.org/
s:license: https://spdx.org/licenses/Apache-2.0
# other s: declarations
For more examples of providing metadata within CWL descriptions, see the Metadata and Authorship section of this User Guide.
☐ Include attribution information for the author(s) of the CWL tool or workflow description. Use unambiguous identifiers like ORCID.
☐ In tool descriptions, list dependencies using short name(s) under SoftwareRequirement
.
☐ Include SciCrunch identifiers for dependencies in https://identifiers.org/rrid/RRID:SCR_NNNNNN
format.
☐ All input
and output
identifiers should reflect their conceptual identity. Use informative names like unaligned_sequences
, reference_genome
, phylogeny
, or aligned_sequences
instead of foo_input
, foo_file
, result
, input
, output
, and so forth.
☐ In tool descriptions, include a list of version(s) of the tool that are known to work with this description under SoftwareRequirement
.
☐ format
should be specified for all input and output File
s. Bioinformatics tools should use format identifiers from EDAM. See also iana:text/plain
, iana:text/tab-separated-values
with $namespaces: { iana: "https://www.iana.org/assignments/media-types/" }
. Full IANA media type list (also known as MIME types). For non-bioinformatics tools use or build an appropriate ontology/controlled vocabulary in the same way. Please edit this page to let us know about it.
☐ Mark all input and output File
s that are read from or written to in a streaming compatible way (only once, no random-access), as streamable: true
.
☐ Each CommandLineTool
description should focus on a single operation only, even if the (sub)command is capable of more. Don’t overcomplicate your tool descriptions with options that you don’t need/use.
☐ Custom types should be defined with one external YAML per type definition for re-use.
☐ Include a top level short label
summarising the tool/workflow.
☐ If useful, include a top level doc
as well. This should provide a longer, more detailed description than was provided in the top level label
(see above).
☐ Use type: enum
instead of type: string
for elements with a fixed list of valid values.
☐ Evaluate all use of JavaScript for possible elimination or replacement. One common example: manipulating File
names and paths? Consider whether one of the built in File
properties like basename
, nameroot
, nameext
, etc, could be used instead.
☐ Give the tool description to a colleague (preferably at a different institution) to test and provide feedback.
☐ CWL implementations which also implement SubworkflowFeatureRequirement
can support nesting workflows as a step within others. Complex workflows with individual components which can be abstracted should utilise this to make their workflow modular and allow sections of them to be easily reused.