Debugging workflows

Debugging workflows can be a challenging task, especially for complex workflows with many rules and dependencies. Here are some tips and tools that can help you troubleshoot workflows effectively:

Log files

Each Snakemake run will produce a log file in .snakemake/log/, mirroring the information printed to the console when running Snakemake. This log file can be especially helpful when running Snakemake in a non-interactive environment, e.g. when executing Snakemake as a cluster job or in a container.

Logs of remotely executed jobs

Depending on the executor you are using, additional log files may be generated for each job. For the location thereof, please refer to the documentation of the respective executor plugin.

Redirecting STDERR of rules

When using the log directive on rules using script, you can add the following snippets to the rules’ scripts to redirect their STDERR to the specified log file:

For Python scripts:

import sys
sys.stderr = open(snakemake.log[0], "w", buffering=1)

Here, the buffering=1 ensures that line buffering is used, so that STDERR lines are written to the log file whenever a full line is available. This avoids information not getting printed before throwing an error due to some longer buffering.

For R scripts:

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

Also, if you are looking to have proper backtraces even for unexpected errors (errors not properly handled in your code or in a package you load), you can use:

rlang::global_entrace()

You will need to have the package rlang installed, but this for example comes with the tidyverse. For infos on the function, see the rlang documentation. Also, this is not expected to incur a performance reduction.

Saving time on DAG building and resolution

To quickly debug a particular rule, you can specify the output of that rule as the desired target when running Snakemake. This will speed up building the DAG, as Snakemake will only resolve the part of the DAG that is necessary to produce the specified output file. To avoid clashes with command line argument specifications, it is best to provide the desired output file as the first argument right after snakemake:

snakemake path/to/desired/output.file <other arguments>

Interactive debugging

Debugging the main workflow process using Visual Studio Code

The Visual Studio Code editor comes with a powerful debugger for Python. With a little tinkering, you can also use it to debug Snakemake’s main process, which can be very useful for developing both workflows and plugins for Snakemake.

Note

This is neither officially supported by Snakemake nor by Visual Studio Code, and may frequently crash!

To set this up, perform the following steps:

Install Visual Studio Code and the Python debugger extension
Open your workflow in Visual Studio Code and create or open the file .vscode/launch.json.
Add the following configuration to the launch.json file:

{
    "configurations": [

        // ...

        {
            "name": "DebugPy: debug Snakemake workflow",
            "request": "launch",
            "type": "debugpy",
            "cwd": "${workspaceFolder}",
            "args": [
                "snakemake",
                "--debug",
                "--cores",
                "1",
                "--nolock",
                "--forceall",
                "--executor",
                "local",
            ],
            "program": "-m",
            "python": "${command:python.interpreterPath}",
            "console": "internalConsole",
            "redirectOutput": true,
            "internalConsoleOptions": "openOnSessionStart",
            // Don't set justMyCode to 'true' - otherwise breakpoints will be skipped.
            // Technically they do not occur within your code, but within Snakemake's workflow.py
            "justMyCode": false,
        },
    ]
}

To now debug your workflow:

Add breakpoints to your Snakefile by adding breakpoint() statements wherever you want to halt execution to inspect the state of the workflow - for example, in the onerror handler:

onerror: # Is executed if the pipeline fails
    breakpoint()

Start the debugger in Visual Studio Code by navigating to the “Run and Debug” tab and selecting the “DebugPy: debug Snakemake workflow” configuration (keyboard shortcut: F5). This will open an interactive debugging console, allowing you to step through the workflow execution and inspect variables, e.g. Snakemake’s workflow object.

Note

This will not work for debugging the execution of individual jobs, regardless of whether they are executed locally or remotely. For this, you can use Snakemake’s --debug flag, see below.

Further reading: Python debugging in VS Code

Debugging of individual jobs

For Python scripts / run blocks:

When executing Snakemake with the --debug flag, Snakemake will drop into an interactive Python debugger (PDB) session. By including breakpoint() statements in your code you can specify where PDB should halt execution, allowing you to explore the current state of the job.

For R scripts / run blocks:

You can save the entire current state of a workspace in R for debugging. Insert this line right before the code that triggers an error:

save.image(file = "workspace.RData")

Activate the conda environment that the rule uses (you can find this in the rule’s logging output, with a statement Activating conda environment: <path to environment>) and start an interactive R session. In this session, load all the library() s that you need for the script. Then you can load the full workspace and interactively explore / debug what’s going on:

load("workspace.RData")

Preserving wrapper scripts

Snakemake produces a series of wrapper scripts for rules using the script directive (default location .snakemake/scripts/). Normally, these are deleted after each run. For debugging purposes, you can disable this behavior by running snakemake with the --skip-script-cleanup flag.