Snakemake

https://img.shields.io/conda/dn/bioconda/snakemake.svg?label=Bioconda https://img.shields.io/pypi/pyversions/snakemake.svg https://img.shields.io/pypi/v/snakemake.svg https://img.shields.io/docker/cloud/build/snakemake/snakemake https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=master https://img.shields.io/badge/stack-overflow-orange.svg https://img.shields.io/twitter/follow/johanneskoester.svg?style=social&label=Follow GitHub stars

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Snakemake is highly popular with, ~3 new citations per week.

Quick Example

Snakemake workflows are essentially Python scripts extended by declarative code to define rules. Rules describe how to create output files from input files.

rule targets:
    input:
        "plots/myplot.pdf"

rule transform:
    input:
        "raw/{dataset}.csv"
    output:
        "transformed/{dataset}.csv"
    singularity:
        "docker://somecontainer:v1.0"
    shell:
        "somecommand {input} {output}"

rule aggregate_and_plot:
    input:
        expand("transformed/{dataset}.csv", dataset=[1, 2])
    output:
        "plots/myplot.pdf"
    conda:
        "envs/matplotlib.yaml"
    script:
        "scripts/plot.py"
  • Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.
  • For each target and intermediate file, you create rules that define how they are created from input files.
  • Snakemake determines the rule dependencies by matching file names.
  • Input and output files can contain multiple named wildcards.
  • Rules can either use shell commands, plain Python code or external Python or R scripts to create output files from input files.
  • Snakemake workflows can be easily executed on workstations, clusters, the grid, and in the cloud without modification. The job scheduling can be constrained by arbitrary resources like e.g. available CPU cores, memory or GPUs.
  • Snakemake can automatically deploy required software dependencies of a workflow using Conda or Singularity.
  • Snakemake can use Amazon S3, Google Storage, Dropbox, FTP, WebDAV, SFTP and iRODS to access input or output files and further access input files via HTTP and HTTPS.

Getting started

To get a first impression, see our introductory slides or watch the live demo video. News about Snakemake are published via Twitter. To learn Snakemake, please do the Snakemake Tutorial, and see the FAQ.

Support

Resources

Snakemake Wrappers Repository
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Snakemake Workflows Project
This project provides a collection of high quality modularized and re-usable workflows. The provided code should also serve as a best-practices of how to build production ready workflows with Snakemake. Everybody is invited to contribute.
Snakemake Profiles Project
This project provides Snakemake configuration profiles for various execution environments. Please consider contributing your own if it is still missing.
Bioconda
Bioconda can be used from Snakemake for creating completely reproducible workflows by defining the used software versions and providing binaries.

Publications using Snakemake

In the following you find an incomplete list of publications making use of Snakemake for their analyses. Please consider to add your own.