The Snakemake API¶

snakemake.snakemake(snakefile, batch=None, cache=None, report=None, report_stylesheet=None, containerize=False, lint=None, generate_unit_tests=None, listrules=False, list_target_rules=False, cores=1, nodes=None, local_cores=1, max_threads=None, resources={}, overwrite_threads=None, overwrite_scatter=None, default_resources=None, overwrite_resources=None, config={}, configfiles=None, config_args=None, workdir=None, targets=None, dryrun=False, touch=False, forcetargets=False, forceall=False, forcerun=[], until=[], omit_from=[], prioritytargets=[], stats=None, printreason=False, printshellcmds=False, debug_dag=False, printdag=False, printrulegraph=False, printfilegraph=False, printd3dag=False, nocolor=False, quiet=False, keepgoing=False, cluster=None, cluster_config=None, cluster_sync=None, drmaa=None, drmaa_log_dir=None, jobname='snakejob.{rulename}.{jobid}.sh', immediate_submit=False, standalone=False, ignore_ambiguity=False, snakemakepath=None, lock=True, unlock=False, cleanup_metadata=None, conda_cleanup_envs=False, cleanup_shadow=False, cleanup_scripts=True, force_incomplete=False, ignore_incomplete=False, list_version_changes=False, list_code_changes=False, list_input_changes=False, list_params_changes=False, list_untracked=False, list_resources=False, summary=False, archive=None, delete_all_output=False, delete_temp_output=False, detailed_summary=False, latency_wait=3, wait_for_files=None, print_compilation=False, debug=False, notemp=False, all_temp=False, keep_remote_local=False, nodeps=False, keep_target_files=False, allowed_rules=None, jobscript=None, greediness=None, no_hooks=False, overwrite_shellcmd=None, updated_files=None, log_handler=[], keep_logger=False, max_jobs_per_second=None, max_status_checks_per_second=100, restart_times=0, attempt=1, verbose=False, force_use_threads=False, use_conda=False, use_singularity=False, use_env_modules=False, singularity_args='', conda_frontend='conda', conda_prefix=None, conda_cleanup_pkgs=None, list_conda_envs=False, singularity_prefix=None, shadow_prefix=None, scheduler='ilp', scheduler_ilp_solver=None, conda_create_envs_only=False, mode=0, wrapper_prefix=None, kubernetes=None, container_image=None, tibanna=False, tibanna_sfn=None, google_lifesciences=False, google_lifesciences_regions=None, google_lifesciences_location=None, google_lifesciences_cache=False, tes=None, preemption_default=None, preemptible_rules=None, precommand='', default_remote_provider=None, default_remote_prefix='', tibanna_config=False, assume_shared_fs=True, cluster_status=None, export_cwl=None, show_failed_logs=False, keep_incomplete=False, keep_metadata=True, messaging=None, edit_notebook=None, envvars=None, overwrite_groups=None, group_components=None, max_inventory_wait_time=20, execute_subworkflows=True, conda_not_block_search_path_envvars=False, scheduler_solver_path=None, conda_base_path=None)[source]¶

Run snakemake on a given snakefile.

This function provides access to the whole snakemake functionality. It is not thread-safe.

Parameters

snakefile (str) – the path to the snakefile
batch (Batch) – whether to compute only a partial DAG, defined by the given Batch object (default None)
report (str) – create an HTML report for a previous run at the given path
lint (str) – print lints instead of executing (None, “plain” or “json”, default None)
listrules (bool) – list rules (default False)
list_target_rules (bool) – list target rules (default False)
cores (int) – the number of provided cores (ignored when using cluster support) (default 1)
nodes (int) – the number of provided cluster nodes (ignored without cluster support) (default 1)
local_cores (int) – the number of provided local cores if in cluster mode (ignored without cluster support) (default 1)
resources (dict) – provided resources, a dictionary assigning integers to resource names, e.g. {gpu=1, io=5} (default {})
default_resources (DefaultResources) – default values for resources not defined in rules (default None)
config (dict) – override values for workflow config
workdir (str) – path to working directory (default None)
targets (list) – list of targets, e.g. rule or file names (default None)
dryrun (bool) – only dry-run the workflow (default False)
touch (bool) – only touch all output files if present (default False)
forcetargets (bool) – force given targets to be re-created (default False)
forceall (bool) – force all output files to be re-created (default False)
forcerun (list) – list of files and rules that shall be re-created/re-executed (default [])
execute_subworkflows (bool) – execute subworkflows if present (default True)
prioritytargets (list) – list of targets that shall be run with maximum priority (default [])
stats (str) – path to file that shall contain stats about the workflow execution (default None)
printreason (bool) – print the reason for the execution of each job (default false)
printshellcmds (bool) – print the shell command of each job (default False)
printdag (bool) – print the dag in the graphviz dot language (default False)
printrulegraph (bool) – print the graph of rules in the graphviz dot language (default False)
printfilegraph (bool) – print the graph of rules with their input and output files in the graphviz dot language (default False)
printd3dag (bool) – print a D3.js compatible JSON representation of the DAG (default False)
nocolor (bool) – do not print colored output (default False)
quiet (bool) – do not print any default job information (default False)
keepgoing (bool) – keep goind upon errors (default False)
cluster (str) – submission command of a cluster or batch system to use, e.g. qsub (default None)
cluster_config (str,list) – configuration file for cluster options, or list thereof (default None)
cluster_sync (str) – blocking cluster submission command (like SGE ‘qsub -sync y’) (default None)
drmaa (str) – if not None use DRMAA for cluster support, str specifies native args passed to the cluster when submitting a job
drmaa_log_dir (str) – the path to stdout and stderr output of DRMAA jobs (default None)
jobname (str) – naming scheme for cluster job scripts (default “snakejob.{rulename}.{jobid}.sh”)
immediate_submit (bool) – immediately submit all cluster jobs, regardless of dependencies (default False)
standalone (bool) – kill all processes very rudely in case of failure (do not use this if you use this API) (default False) (deprecated)
ignore_ambiguity (bool) – ignore ambiguous rules and always take the first possible one (default False)
snakemakepath (str) – deprecated parameter whose value is ignored. Do not use.
lock (bool) – lock the working directory when executing the workflow (default True)
unlock (bool) – just unlock the working directory (default False)
cleanup_metadata (list) – just cleanup metadata of given list of output files (default None)
drop_metadata (bool) – drop metadata file tracking information after job finishes (–report and –list_x_changes information will be incomplete) (default False)
conda_cleanup_envs (bool) – just cleanup unused conda environments (default False)
cleanup_shadow (bool) – just cleanup old shadow directories (default False)
cleanup_scripts (bool) – delete wrapper scripts used for execution (default True)
force_incomplete (bool) – force the re-creation of incomplete files (default False)
ignore_incomplete (bool) – ignore incomplete files (default False)
list_version_changes (bool) – list output files with changed rule version (default False)
list_code_changes (bool) – list output files with changed rule code (default False)
list_input_changes (bool) – list output files with changed input files (default False)
list_params_changes (bool) – list output files with changed params (default False)
list_untracked (bool) – list files in the workdir that are not used in the workflow (default False)
summary (bool) – list summary of all output files and their status (default False)
archive (str) – archive workflow into the given tarball
delete_all_output (bool) – remove all files generated by the workflow (default False)
delete_temp_output (bool) – remove all temporary files generated by the workflow (default False)
latency_wait (int) – how many seconds to wait for an output file to appear after the execution of a job, e.g. to handle filesystem latency (default 3)
wait_for_files (list) – wait for given files to be present before executing the workflow
list_resources (bool) – list resources used in the workflow (default False)
summary – list summary of all output files and their status (default False). If no option is specified a basic summary will be ouput. If ‘detailed’ is added as an option e.g –summary detailed, extra info about the input and shell commands will be included
detailed_summary (bool) – list summary of all input and output files and their status (default False)
print_compilation (bool) – print the compilation of the snakefile (default False)
debug (bool) – allow to use the debugger within rules
notemp (bool) – ignore temp file flags, e.g. do not delete output files marked as temp after use (default False)
keep_remote_local (bool) – keep local copies of remote files (default False)
nodeps (bool) – ignore dependencies (default False)
keep_target_files (bool) – do not adjust the paths of given target files relative to the working directory.
allowed_rules (set) – restrict allowed rules to the given set. If None or empty, all rules are used.
jobscript (str) – path to a custom shell script template for cluster jobs (default None)
greediness (float) – set the greediness of scheduling. This value between 0 and 1 determines how careful jobs are selected for execution. The default value (0.5 if prioritytargets are used, 1.0 else) provides the best speed and still acceptable scheduling quality.
overwrite_shellcmd (str) – a shell command that shall be executed instead of those given in the workflow. This is for debugging purposes only.
updated_files (list) – a list that will be filled with the files that are updated or created during the workflow execution
verbose (bool) – show additional debug output (default False)
max_jobs_per_second (int) – maximal number of cluster/drmaa jobs per second, None to impose no limit (default None)
restart_times (int) – number of times to restart failing jobs (default 0)
attempt (int) – initial value of Job.attempt. This is intended for internal use only (default 1).
force_use_threads – whether to force use of threads over processes. helpful if shared memory is full or unavailable (default False)
use_conda (bool) – use conda environments for each job (defined with conda directive of rules)
use_singularity (bool) – run jobs in singularity containers (if defined with singularity directive)
use_env_modules (bool) – load environment modules if defined in rules
singularity_args (str) – additional arguments to pass to singularity
conda_prefix (str) – the directory in which conda environments will be created (default None)
conda_cleanup_pkgs (snakemake.deployment.conda.CondaCleanupMode) – whether to clean up conda tarballs after env creation (default None), valid values: “tarballs”, “cache”
singularity_prefix (str) – the directory to which singularity images will be pulled (default None)
shadow_prefix (str) – prefix for shadow directories. The job-specific shadow directories will be created in $SHADOW_PREFIX/shadow/ (default None)
conda_create_envs_only (bool) – if specified, only builds the conda environments specified for each job, then exits.
list_conda_envs (bool) – list conda environments and their location on disk.
mode (snakemake.common.Mode) – execution mode
wrapper_prefix (str) – prefix for wrapper script URLs (default None)
kubernetes (str) – submit jobs to kubernetes, using the given namespace.
container_image (str) – Docker image to use, e.g., for kubernetes.
default_remote_provider (str) – default remote provider to use instead of local files (e.g. S3, GS)
default_remote_prefix (str) – prefix for default remote provider (e.g. name of the bucket).
tibanna (bool) – submit jobs to AWS cloud using Tibanna.
tibanna_sfn (str) – Step function (Unicorn) name of Tibanna (e.g. tibanna_unicorn_monty). This must be deployed first using tibanna cli.
google_lifesciences (bool) – submit jobs to Google Cloud Life Sciences (pipelines API).
google_lifesciences_regions (list) – a list of regions (e.g., us-east1)
google_lifesciences_location (str) – Life Sciences API location (e.g., us-central1)
google_lifesciences_cache (bool) – save a cache of the compressed working directories in Google Cloud Storage for later usage.
tes (str) – Execute workflow tasks on GA4GH TES server given by url.
precommand (str) – commands to run on AWS cloud before the snakemake command (e.g. wget, git clone, unzip, etc). Use with –tibanna.
preemption_default (int) – set a default number of preemptible instance retries (for Google Life Sciences executor only)
preemptible_rules (list) – define custom preemptible instance retries for specific rules (for Google Life Sciences executor only)
tibanna_config (list) – Additional tibanna config e.g. –tibanna-config spot_instance=true subnet=<subnet_id> security group=<security_group_id>
assume_shared_fs (bool) – assume that cluster nodes share a common filesystem (default true).
cluster_status (str) – status command for cluster execution. If None, Snakemake will rely on flag files. Otherwise, it expects the command to return “success”, “failure” or “running” when executing with a cluster jobid as single argument.
export_cwl (str) – Compile workflow to CWL and save to given file
log_handler (list) – redirect snakemake output to this custom log handler, a function that takes a log message dictionary (see below) as its only argument (default None). The log message dictionary for the log handler has to following entries:
keep_incomplete (bool) – keep incomplete output files of failed jobs
edit_notebook (object) – “notebook.EditMode” object to configuring notebook server for interactive editing of a rule notebook. If None, do not edit.
scheduler (str) – Select scheduling algorithm (default ilp)
scheduler_ilp_solver (str) – Set solver for ilp scheduler.
overwrite_groups (dict) – Rule to group assignments (default None)
group_components (dict) – Number of connected components given groups shall span before being split up (1 by default if empty)
conda_not_block_search_path_envvars (bool) – Do not block search path envvars (R_LIBS, PYTHONPATH, …) when using conda environments.
scheduler_solver_path (str) – Path to Snakemake environment (this can be used to e.g. overwrite the search path for the ILP solver used during scheduling).
conda_base_path (str) – Path to conda base environment (this can be used to overwrite the search path for conda, mamba and activate).
log_handler –
redirect snakemake output to this list of custom log handler, each a function that takes a log message dictionary (see below) as its only argument (default []). The log message dictionary for the log handler has to following entries:

level

the log level (“info”, “error”, “debug”, “progress”, “job_info”)

level=”info”, “error” or “debug”

msg

the log message

level=”progress”

done

number of already executed jobs

total

number of total jobs

level=”job_info”

input

list of input files of a job

output

list of output files of a job

log

path to log file of a job

local

whether a job is executed locally (i.e. ignoring cluster)

msg

the job message

reason

the job reason

priority

the job priority

threads

the threads of the job

Returns

True if workflow execution was successful.

Return type

bool