## What is the key idea of Snakemake workflows?¶

The key idea is very similar to GNU Make. The workflow is determined automatically from top (the files you want) to bottom (the files you have), by applying very general rules with wildcards you give to Snakemake:

When you start using Snakemake, please make sure to walk through the official tutorial. It is crucial to understand how to properly use the system.

## My shell command fails with with errors about an “unbound variable”, what’s wrong?¶

This happens often when calling virtual environments from within Snakemake. Snakemake is using bash strict mode, to ensure e.g. proper error behavior of shell scripts. Unfortunately, virtualenv and some other tools violate bash strict mode. he quick fix for virtualenv is to temporarily deactivate the check for unbound variables

set +u; source /path/to/venv/bin/activate; set -u

For more details on bash strict mode, see the here.

## How do I run my rule on all files of a certain directory?¶

In Snakemake, similar to GNU Make, the workflow is determined from the top, i.e. from the target files. Imagine you have a directory with files 1.fastq, 2.fastq, 3.fastq, ..., and you want to produce files 1.bam, 2.bam, 3.bam, ... you should specify these as target files, using the ids 1,2,3,.... You could end up with at least two rules like this (or any number of intermediate steps):

IDS = "1 2 3 ...".split() # the list of desired ids

# a pseudo-rule that collects the target files
rule all:
input:  expand("otherdir/{id}.bam", id=IDS)

# a general rule using wildcards that does the work
rule:
input:  "thedir/{id}.fastq"
output: "otherdir/{id}.bam"
shell:  "..."

Snakemake will then go down the line and determine which files it needs from your initial directory.

In order to infer the IDs from present files, Snakemake provides the glob_wildcards function, e.g.

IDS, = glob_wildcards("thedir/{id}.fastq")

The function matches the given pattern against the files present in the filesystem and thereby infers the values for all wildcards in the pattern. A named tuple that contains a list of values for each wildcard is returned. Here, this named tuple has only one item, that is the list of values for the wildcard {id}.

## Snakemake complains about a cyclic dependency or a PeriodicWildcardError. What can I do?¶

One limitation of Snakemake is that graphs of jobs have to be acyclic (similar to GNU Make). This means, that no path in the graph may be a cycle. Although you might have considered this when designing your workflow, Snakemake sometimes runs into situations where a cyclic dependency cannot be avoided without further information, although the solution seems obvious for the developer. Consider the following example:

rule all:
input:
"a"

rule unzip:
input:
"{sample}.tar.gz"
output:
"{sample}"
shell:
"tar -xf {input}"

If this workflow is executed with

snakemake -n

two things may happen.

1. If the file a.tar.gz is present in the filesystem, Snakemake will propose the following (expected and correct) plan:

rule a:
input: a.tar.gz
output: a
wildcards: sample=a
localrule all:
input: a
Job counts:
count   jobs
1       a
1       all
2

2. If the file a.tar.gz is not present and cannot be created by any other rule than rule a, Snakemake will try to run rule a again, with {sample}=a.tar.gz. This would infinitely go on recursively. Snakemake detects this case and produces a PeriodicWildcardError.

In summary, PeriodicWildcardErrors hint to a problem where a rule or a set of rules can be applied to create its own input. If you are lucky, Snakemake can be smart and avoid the error by stopping the recursion if a file exists in the filesystem. Importantly, however, bugs upstream of that rule can manifest as PeriodicWildcardError, although in reality just a file is missing or named differently. In such cases, it is best to restrict the wildcard of the output file(s), or follow the general rule of putting output files of different rules into unique subfolders of your working directory. This way, you can discover the true source of your error.

## Is it possible to pass variable values to the workflow via the command line?¶

Yes, this is possible. Have a look at Configuration. Previously it was necessary to use environment variables like so: E.g. write

$SAMPLES="1 2 3 4 5" snakemake and have in the Snakefile some Python code that reads this environment variable, i.e. SAMPLES = os.environ.get("SAMPLES", "10 20").split() ## I get a NameError with my shell command. Are braces unsupported?¶ You can use the entire Python format minilanguage in shell commands. Braces in shell commands that are not intended to insert variable values thus have to be escaped by doubling them: ... shell: "awk '{{print$1}}' {input}"

Here the double braces are escapes, i.e. there will remain single braces in the final command. In contrast, {input} is replaced with an input filename.

## How do I incorporate files that do not follow a consistent naming scheme?¶

The best solution is to have a dictionary that translates a sample id to the inconsistently named files and use a function (see Functions as Input Files) to provide an input file like this:

FILENAME = dict(...)  # map sample ids to the irregular filenames here

rule:
# use a function as input to delegate to the correct filename
input: lambda wildcards: FILENAME[wildcards.sample]
output: "somefolder/{sample}.csv"
shell: ...

## How do I force Snakemake to rerun all jobs from the rule I just edited?¶

This can be done by invoking Snakemake with the --forcerules or -R flag, followed by the rules that should be re-executed:

$snakemake mytarget --config foo=bar ## How do I make my rule fail if an output file is empty?¶ Snakemake expects shell commands to behave properly, meaning that failures should cause an exit status other than zero. If a command does not exit with a status other than zero, Snakemake assumes everything worked fine, even if output files are empty. This is because empty output files are also a reasonable tool to indicate progress where no real output was produced. However, sometimes you will have to deal with tools that do not properly report their failure with an exit status. Here, the recommended way is to use bash to check for non-empty output files, e.g.: rule: input: ... output: "my/output/file.txt" shell: "somecommand {input} {output} && [[ -s {output} ]]" ## How does Snakemake lock the working directory?¶ Per default, Snakemake will lock a working directory by output and input files. Two Snakemake instances that want to create the same output file are not possible. Two instances creating disjoint sets of output files are possible. With the command line option --nolock, you can disable this mechanism on your own risk. With --unlock, you can be remove a stale lock. Stale locks can appear if your machine is powered off with a running Snakemake instance. ## Snakemake does not trigger re-runs if I add additional input files. What can I do?¶ Snakemake has a kind of “lazy” policy about added input files if their modification date is older than that of the output files. One reason is that information what to do cannot be inferred just from the input and output files. You need additional information about the last run to be stored. Since behaviour would be inconsistent between cases where that information is available and where it is not, this functionality has been encoded as an extra switch. To trigger updates for jobs with changed input files, you can use the command line argument –list-input-changes in the following way:$ snakemake -n -R snakemake --list-input-changes

Here, snakemake --list-input-changes returns the list of output files with changed input files, which is fed into -R to trigger a re-run.

## How do I trigger re-runs for rules with updated code or parameters?¶

Similar to the solution above, you can use

$snakemake -n -R snakemake --list-params-changes and$ snakemake -n -R snakemake --list-code-changes

Again, the list commands in backticks return the list of output files with changes, which are fed into -R to trigger a re-run.

## How do I remove all files created by snakemake, i.e. like make clean¶

To remove all files created by snakemake as output files to start from scratch, you can use

## Git is messing up the modification times of my input files, what can I do?¶

When you checkout a git repository, the modification times of updated files are set to the time of the checkout. If you rely on these files as input and output files in your workflow, this can cause trouble. For example, Snakemake could think that a certain (git-tracked) output has to be re-executed, just because its input has been checked out a bit later. In such cases, it is advisable to set the file modification dates to the last commit date after an update has been pulled. See here for a solution to achieve this.