Welcome to Snakemake’s documentation!¶

https://quay.io/repository/snakemake/snakemake/status

https://app.wercker.com/status/5b4faec0485e3b6ed5497f3e8e551b34/s/master

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Quick Example¶

Snakemake workflows are essentially Python scripts extended by declarative code to define rules. Rules describe how to create output files from input files.

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    shell:
        "somecommand {input} {output}"

Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.
For each target and intermediate file, you create rules that define how they are created from input files.
Snakemake determines the rule dependencies by matching file names.
Input and output files can contain multiple named wildcards.
Rules can either use shell commands, plain Python code or external Python or R scripts to create output files from input files.
Snakemake workflows can be easily executed on workstations, clusters, the grid, and in the cloud without modification. The job scheduling can be constrained by arbitrary resources like e.g. available CPU cores, memory or GPUs.
Snakemake can automatically deploy required software dependencies of a workflow using Conda or Singularity.
Snakemake can use Amazon S3, Google Storage, Dropbox, FTP, WebDAV and SFTP to access input or output files and further access input files via HTTP and HTTPS.

Getting started¶

To get started, consider the Snakemake Tutorial, the introductory slides, and the FAQ.

Support¶

In case of questions, please post on stack overflow.
To discuss with other Snakemake users, you can use the mailing list.
For bugs and feature requests, please use the issue tracker.
For contributions, visit Snakemake on bitbucket and read the guidelines.

Citation¶

Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.

See Citations for more information.

Publications using Snakemake¶

In the following you find an incomplete list of publications making use of Snakemake for their analyses. Please consider to add your own.

Uhlitz et al. 2017. An immediate–late gene expression module decodes ERK signal duration. Molecular Systems Biology.
Akkouche et al. 2017. Piwi Is Required during Drosophila Embryogenesis to License Dual-Strand piRNA Clusters for Transposon Repression in Adult Ovaries. Molecular Cell.
Beatty et al. 2017. `Giardia duodenalis induces pathogenic dysbiosis of human intestinal microbiota biofilms <>`_. International Journal for Parasitology.
Meyer et al. 2017. Differential Gene Expression in the Human Brain Is Associated with Conserved, but Not Accelerated, Noncoding Sequences. Molecular Biology and Evolution.
Lonardo et al. 2017. Priming of soil organic matter: Chemical structure of added compounds is more important than the energy content. Soil Biology and Biochemistry.
Beisser et al. 2017. Comprehensive transcriptome analysis provides new insights into nutritional strategies and phylogenetic relationships of chrysophytes. PeerJ.
Dimitrov et al 2017. Successive DNA extractions improve characterization of soil microbial communities. PeerJ.
de Bourcy et al. 2016. Phylogenetic analysis of the human antibody repertoire reveals quantitative signatures of immune senescence and aging. PNAS.
Bray et al. 2016. `Near-optimal probabilistic RNA-seq quantification<http://www.nature.com/nbt/journal/v34/n5/abs/nbt.3519.html>`_. Nature Biotechnology.
Etournay et al. 2016. TissueMiner: a multiscale analysis toolkit to quantify how cellular processes create tissue dynamics. eLife Sciences.
Townsend et al. 2016. The Public Repository of Xenografts Enables Discovery and Randomized Phase II-like Trials in Mice. Cancer Cell.
Burrows et al. 2016. Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs. PLOS Genetics.
Ziller et al. 2015. Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods.
Li et al. 2015. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biology.
Schmied et al. 2015. An automated workflow for parallel processing of large multiview SPIM recordings. Bioinformatics.
Chung et al. 2015. Whole-Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations. Human Mutation.
Kim et al. 2015. TUT7 controls the fate of precursor microRNAs by using three different uridylation mechanisms. The EMBO Journal.
Park et al. 2015. Ebola Virus Epidemiology, Transmission, and Evolution during Seven Months in Sierra Leone. Cell.
Břinda et al. 2015. RNF: a general framework to evaluate NGS read mappers. Bioinformatics.
Břinda et al. 2015. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics.
Spjuth et al. 2015. Experiences with workflows for automating data-intensive bioinformatics. Biology Direct.
Schramm et al. 2015. Mutational dynamics between primary and relapse neuroblastomas. Nature Genetics.
Berulava et al. 2015. N6-Adenosine Methylation in MiRNAs. PLOS ONE.
The Genome of the Netherlands Consortium 2014. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics.
Patterson et al. 2014. WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads. Journal of Computational Biology.
Fernández et al. 2014. H3K4me1 marks DNA regions hypomethylated during aging in human stem and differentiated cells. Genome Research.
Köster et al. 2014. Massively parallel read mapping on GPUs with the q-group index and PEANUT. PeerJ.
Chang et al. 2014. TAIL-seq: Genome-wide Determination of Poly(A) Tail Length and 3′ End Modifications. Molecular Cell.
Althoff et al. 2013. MiR-137 functions as a tumor suppressor in neuroblastoma by downregulating KDM1A. International Journal of Cancer.
Marschall et al. 2013. MATE-CLEVER: Mendelian-Inheritance-Aware Discovery and Genotyping of Midsize and Long Indels. Bioinformatics.
Rahmann et al. 2013. Identifying transcriptional miRNA biomarkers by integrating high-throughput sequencing and real-time PCR data. Methods.
Martin et al. 2013. Exome sequencing identifies recurrent somatic mutations in EIF1AX and SF3B1 in uveal melanoma with disomy 3. Nature Genetics.
Czeschik et al. 2013. Clinical and mutation data in 12 patients with the clinical diagnosis of Nager syndrome. Human Genetics.
Marschall et al. 2012. CLEVER: Clique-Enumerating Variant Finder. Bioinformatics.