Examples¶
Most of the examples below assume that Snakemake is executed in a project-specific root directory.
The paths in the Snakefiles below are relative to this directory.
We follow the convention to use different subdirectories for different intermediate results, e.g., mapped/
for mapped sequence reads in .bam
files, etc.
Cufflinks¶
Cufflinks is a tool to assemble transcripts, calculate abundance and conduct a differential expression analysis on RNA-Seq data. This example shows how to create a typical Cufflinks workflow with Snakemake. It assumes that mapped RNA-Seq data for four samples 101-104 is given as bam files.
- For each sample, transcripts are assembled with
cufflinks
(ruleassembly
). - Assemblies are merged into one gtf with
cuffmerge
(rulemerge_assemblies
). - A comparison to the hg19 gtf-track is conducted (rule
compare_assemblies
). - Finally, differential expressions are calculated on the found transcripts (rule
diffexp
).
# path to track and reference
TRACK = 'hg19.gtf'
REF = 'hg19.fa'
# sample names and classes
CLASS1 = '101 102'.split()
CLASS2 = '103 104'.split()
SAMPLES = CLASS1 + CLASS2
# path to bam files
CLASS1_BAM = expand('mapped/{sample}.bam', sample=CLASS1)
CLASS2_BAM = expand('mapped/{sample}.bam', sample=CLASS2)
rule all:
input:
'diffexp/isoform_exp.diff',
'assembly/comparison'
rule assembly:
input:
'mapped/{sample}.bam'
output:
'assembly/{sample}/transcripts.gtf',
dir='assembly/{sample}'
threads: 4
shell:
'cufflinks --num-threads {threads} -o {output.dir} '
'--frag-bias-correct {REF} {input}'
rule compose_merge:
input:
expand('assembly/{sample}/transcripts.gtf', sample=SAMPLES)
output:
txt='assembly/assemblies.txt'
run:
with open(output.txt, 'w') as out:
print(*input, sep="\n", file=out)
rule merge_assemblies:
input:
'assembly/assemblies.txt'
output:
'assembly/merged/merged.gtf', dir='assembly/merged'
shell:
'cuffmerge -o {output.dir} -s {REF} {input}'
rule compare_assemblies:
input:
'assembly/merged/merged.gtf'
output:
'assembly/comparison/all.stats',
dir='assembly/comparison'
shell:
'cuffcompare -o {output.dir}all -s {REF} -r {TRACK} {input}'
rule diffexp:
input:
class1=CLASS1_BAM,
class2=CLASS2_BAM,
gtf='assembly/merged/merged.gtf'
output:
'diffexp/gene_exp.diff', 'diffexp/isoform_exp.diff'
params:
class1=",".join(CLASS1_BAM),
class2=",".join(CLASS2_BAM)
threads: 8
shell:
'cuffdiff --num-threads {threads} {gtf} {params.class1} {params.class2}'
The execution plan of Snakemake for this workflow can be visualized with the following DAG.

Building a C Program¶
GNU Make is primarily used to build C/C++ code. Snakemake can do the same, while providing a superior readability due to less obscure variables inside the rules.
The following example Makefile was adapted from http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/.
IDIR=../include
ODIR=obj
LDIR=../lib
LIBS=-lm
CC=gcc
CFLAGS=-I$(IDIR)
_HEADERS = hello.h
HEADERS = $(patsubst %,$(IDIR)/%,$(_HEADERS))
_OBJS = hello.o hellofunc.o
OBJS = $(patsubst %,$(ODIR)/%,$(_OBJS))
# build the executable from the object files
hello: $(OBJS)
$(CC) -o $@ $^ $(CFLAGS)
# compile a single .c file to an .o file
$(ODIR)/%.o: %.c $(HEADERS)
$(CC) -c -o $@ $< $(CFLAGS)
# clean up temporary files
.PHONY: clean
clean:
rm -f $(ODIR)/*.o *~ core $(IDIR)/*~
A Snakefile can be easily written as
from os.path import join
IDIR = '../include'
ODIR = 'obj'
LDIR = '../lib'
LIBS = '-lm'
CC = 'gcc'
CFLAGS = '-I' + IDIR
_HEADERS = ['hello.h']
HEADERS = [join(IDIR, hfile) for hfile in _HEADERS]
_OBJS = ['hello.o', 'hellofunc.o']
OBJS = [join(ODIR, ofile) for ofile in _OBJS]
rule hello:
"""build the executable from the object files"""
output:
'hello'
input:
OBJS
shell:
"{CC} -o {output} {input} {CFLAGS} {LIBS}"
rule c_to_o:
"""compile a single .c file to an .o file"""
output:
temp('{ODIR}/{name}.o')
input:
'{name}.c', HEADERS
shell:
"{CC} -c -o {output} {input} {CFLAGS}"
rule clean:
"""clean up temporary files"""
shell:
"rm -f *~ core {IDIR}/*~"
As can be seen, the shell calls become more readable, e.g. "{CC} -c -o {output} {input} {CFLAGS}"
instead of $(CC) -c -o $@ $< $(CFLAGS)
. Further, Snakemake automatically deletes .o
-files when they are not needed anymore since they are marked as temp
.

Building a Paper with LaTeX¶
Building a scientific paper can be automated by Snakemake as well. Apart from compiling LaTeX code and invoking BibTeX, we provide a special rule to zip the needed files for online submission.
We first provide a Snakefile tex.rules
that contains rules that can be shared for any latex build task:
ruleorder: tex2pdf_with_bib > tex2pdf_without_bib
rule tex2pdf_with_bib:
input:
'{name}.tex',
'{name}.bib'
output:
'{name}.pdf'
shell:
"""
pdflatex {wildcards.name}
bibtex {wildcards.name}
pdflatex {wildcards.name}
pdflatex {wildcards.name}
"""
rule tex2pdf_without_bib:
input:
'{name}.tex'
output:
'{name}.pdf'
shell:
"""
pdflatex {wildcards.name}
pdflatex {wildcards.name}
"""
rule texclean:
shell:
"rm -f *.log *.aux *.bbl *.blg *.synctex.gz"
Note how we distinguish between a .tex
file with and without a corresponding .bib
with the same name.
Assuming that both paper.tex
and paper.bib
exist, an ambiguity arises: Both rules are, in principle, applicable.
This would lead to an AmbiguousRuleException
, but since we have specified an explicit rule order in the file, it is clear that in this case the rule tex2pdf_with_bib
is to be preferred.
If the paper.bib
file does not exist, that rule is not even applicable, and the only option is to execute rule tex2pdf_without_bib
.
Assuming that the above file is saved as tex.rules
, the actual documents are then built from a specific Snakefile that includes these common rules:
DOCUMENTS = ['document', 'response-to-editor']
TEXS = [doc+".tex" for doc in DOCUMENTS]
PDFS = [doc+".pdf" for doc in DOCUMENTS]
FIGURES = ['fig1.pdf']
include:
'tex.smrules'
rule all:
input:
PDFS
rule zipit:
output:
'upload.zip'
input:
TEXS, FIGURES, PDFS
shell:
'zip -T {output} {input}'
rule pdfclean:
shell:
"rm -f {PDFS}"
Hence the user can perform 4 different tasks. Build all PDFs:
$ snakemake
Create a zip-file for online submissions:
$ snakemake zipit
Clean up all PDFs:
$ snakemake pdfclean
Clean up latex temporary files:
$ snakemake texclean
The following DAG of jobs would be executed upon a full run:
