Command line interface
This part of the documentation describes the snakemake executable. Snakemake
is primarily a command-line tool, so the snakemake executable is the primary way
to execute, debug, and visualize workflows.
Important environment variables
Snakemake caches source files for performance and reproducibility.
The location of this cache is determined by the platformdirs package.
If you want to change the location on a unix/linux system, you can define an override path via the environment variable XDG_CACHE_HOME.
Useful Command Line Arguments
If called with the number of cores to use, i.e.
$ snakemake --cores 1
Snakemake tries to execute the workflow specified in a file called Snakefile in the same directory (the Snakefile can be given via the parameter -s).
By issuing
$ snakemake -n
a dry-run can be performed. This is useful to test if the workflow is defined properly and to estimate the amount of needed computation.
Importantly, Snakemake can automatically determine which parts of the workflow can be run in parallel. By specifying more than one available core, i.e.
$ snakemake --cores 4
one can tell Snakemake to use up to 4 cores and solve a binary knapsack problem to optimize the scheduling of jobs.
If the number is omitted (i.e., only --cores is given), the number of used cores is determined as the number of available CPU cores in the machine.
Snakemake workflows usually define the number of used threads of certain rules. Sometimes, it makes sense to overwrite the defaults given in the workflow definition.
This can be done by using the --set-threads argument, e.g.,
$ snakemake --cores 4 --set-threads myrule=2
would overwrite whatever number of threads has been defined for the rule myrule and use 2 instead.
Similarly, it is possible to overwrite other resource definitions in rules, via
$ snakemake --cores 4 --set-resources myrule:partition="foo"
Both mechanisms can be particularly handy when used in combination with non-local execution.
Non-local execution
Non-local execution on cluster or cloud infrastructure is implemented via executor plugins. The Snakemake plugin catalog lists available plugins and their documentation. In general, the configuration boils down to specifying an executor plugin (e.g. for SLURM or Kubernetes) and, if needed, a storage plugin (e.g. in order to use S3 for input and output files or in order to efficiently use a shared network filesystem). For maximizing the I/O performance over the network, it can be advisable to annotate the input file access patterns of rules. Snakemake provides lots of tunables for non-local execution, which can all be found under All Options and in the plugin descriptions of the Snakemake plugin catalog. In any case, the cluster or cloud specific configuration will entail lots of command line options to be chosen and set, which should be persisted in a profile.
Dealing with very large workflows
If your workflow has a lot of jobs, Snakemake might need some time to infer the dependencies (the job DAG) and which jobs are actually required to run. The major bottleneck involved is the filesystem, which has to be queried for existence and modification dates of files. To overcome this issue, Snakemake allows to run large workflows in batches. This way, fewer files have to be evaluated at once, and therefore the job DAG can be inferred faster. By running
$ snakemake --cores 4 --batch myrule=1/3
you instruct to only compute the first of three batches of the inputs of the rule myrule.
To generate the second batch, run
$ snakemake --cores 4 --batch myrule=2/3
Finally, when running
$ snakemake --cores 4 --batch myrule=3/3
Snakemake will process beyond the rule myrule, because all of its input files have been generated, and complete the workflow.
Obviously, a good choice of the rule to perform the batching is a rule that has a lot of input files and upstream jobs, for example a central aggregation step within your workflow.
We advice all workflow developers to inform potential users of the best suited batching rule.
Profiles
Adapting runs of Snakemake workflows to a particular computing environment can entail many flags and options. Therefore, since Snakemake 4.1, it is possible to set default options in configuration profile files in YAML format. Two kinds of profiles are supported:
Global profiles are used to define default options for a particular system or compute environment, like the default cluster submission command, the default number of jobs to run in parallel or the default amount of memory to reserve for a job. They should be applicable to all Snakemake workflows a user runs in that compute environment.
A Workflow specific profile profile (introduced in Snakemake 7.29) is used to define default and rule-specific Resources specifications for a particular workflow instance.
Profile YAML files
The default naming pattern for profile YAML files is profile.v9+.yaml, where the version specifier infix v9+. is optional.
This naming pattern is required when you refer to profiles by (directory) name or relative path (to directory containing the actual YAML file), or if you want to specify a minimum required version of snakemake via the optional infix (vX+).
If you directly reference the actual YAML file by name, you can use an arbitrary name for the profile YAML file.
Alongside the actual profile YAML file, the profile folder can additionally contain auxiliary files. These can for example be jobscripts or wrappers. See https://github.com/snakemake/snakemake-cluster-profiles for examples.
While the different types of profiles should usually contain distinct sets of settings, you can configure any of Snakemake’s command line arguments in any of these profiles.
However, if you also provide the same argument in the snakemake call on the command line, this command line specification will always take precedence.
For example, a Global profiles YAML file with
executor: slurm
jobs: 110
default-resources:
mem_mb: 1024
would set Snakemake to always submit to the SLURM cluster using the respective executor plugin, and to never use more than 110 parallel jobs in total.
It gets interpreted into setting --executor slurm --jobs 110 --default-resources mem_mb=1024 on the command line.
For more complex (nested) options, you can use standard YAML nesting syntax; and for simple switch flags, you can set or unset them with the values True and False, respectively.
So, for example, this YAML map in a Workflow specific profile
keep-going: True
set-threads:
myrule: 5
set-resources:
myrule:
mem_mb: 500
will be parsed to --keep-going --set-threads myrule=5 --set-resources myrule:mem_mb=500.
Alternatively, you can also specify anything below the top level keys as a string.
So the following would parse to the same command line argument setup:
set-threads: myrule=5
set-resources: myrule:mem_mb=500
All of these resource specifications can also be made dynamic, by using expressions and certain variables that are available.
For details of the variables you can use, refer to the callable signatures given in the documentation sections on the specification of threads and dynamic resources.
These enable profile.yaml entries like:
default-resources:
mem_mb: max(1.5 * input.size_mb, 100)
set-threads:
myrule: max(input.size_mb / 5, 2)
set-resources:
myrule:
mem_mb: attempt * 200
Also, values in profiles can make use of globally available environment variables, for example the $USER variable.
For example, the following entry would set the default prefix for storing local copies of remote storage files to a user specific directory
local-storage-prefix: /local/work/$USER/snakemake-scratch
Any such environment variables are automatically expanded when evaluating the profile.
Finally, we recommend annotating such profiles with clear comments. From experience, the most useful mode is usually to include comments right above a setting, including the reasoning behind the chosen value and linkouts to any documentation with further information. For inspiration, see the examples in the following sections.
Global profiles
Global profiles are used to define default options for a particular system or compute environment, applicable to all Snakemake workflows run on a particular system.
Defining global profiles
Default options specified in global profiles will include things like the default cluster submission command, the default number of jobs to run in parallel or the default amount of memory to reserve for a job. We recommend to clearly motivate any configuration choices in comments, for example
# This cluster uses the slurm job submission system, for details on how to
# configure the respective executor plugin for snakemake, see
# https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html
executor: slurm
# This cluster allows users to run 100 jobs concurrently. As the slurm
# executor plugin only checks for completed jobs with a lag, we slightly
# oversubmit jobs to always have jobs available in the queue.
jobs: 110
# This cluster allows the use of 200 cores per user. As with the number of
# jobs, we slightly oversubmit.
cores: 220
# If a rule doesn't have the following resources specified, it will default
# to requesting the resources specified here.
default-resources:
mem_mb: 1024
Also, before creating your own global profile from scratch, check whether someone has already created and shared such a profile for your local compute environment at: https://github.com/snakemake/snakemake-cluster-profiles This repository can also serve as inspiration, when creating a new profile. And if you have created such a profile for your local compute cluster, feel free to share it via this repository. Just make sure to check with your local system administrators, if all the information included is OK to be shared publicly.
Using global profiles
To make use of such a global profile, you can make Snakemake aware of it in one of two ways:
You set the environment variable
$SNAKEMAKE_PROFILE.You use the command line argument
--profile.
In either of these cases, you can reference a profile in multiple ways:
With a profile name, which represents a subdirectory of one of the standard locations, and is assumed to contain a file called
profile.yaml(orconfig.yamlfor backwards compatibility).With a path to such a directory containing a
profile.yamlfile, relative to one of the standard locations.With a path to a YAML file with an arbitrary name, relative to one of the standard locations.
The standard locations that snakemake searches are the current working directory, a standard system-wide location and a standard user-specific location.
As the system and user locations are system-dependent, you should always check the --profile entry of the snakemake --help.
This will list the locations specific to the system that you run the command on.
On Linux, these locations are /etc/xdg/snakemake (system-wide) and $HOME/.config/snakemake (user-specific).
As this leads to a plethora of ways to specify profiles, let us provide some examples.
As a first example, assume a system-wide profile with the absolute path /etc/xdg/snakemake/system_profile/profile.v9+.yaml on a Linux system.
You can set the $SNAKEMAKE_PROFILE variable or the --profile argument to any of:
system_profile
system_profile/profile.v9+.yaml
/etc/xdg/snakemake/system_profile/profile.v9+.yaml
As a second example, assume a user-specific profile with absolute path $HOME/.config/snakemake/user_profile/profile.yaml on a Linux system.
You can set the $SNAKEMAKE_PROFILE variable or the --profile argument to any of:
user_profile
user_profile/profile.yaml
$HOME/.config/snakemake/user_profile/profile.yaml
As a third example, assume a user-specific profile in a custom location, /path/to/user_profile/custom_profile_name.yaml.
In this case you have to always use the absolute path because the profile is not in a standard location, and you also have to reference the file by name, as it doesn’t follow the standard naming convention.
The environment variable can either be set by system administrators, providing a default profile for all users of a system.
Alternatively, users can set this environment variable for themselves.
Usually, this is done by setting the following in your shell startup configuration (for example in your ~/.bashrc for bash shells):
export SNAKEMAKE_PROFILE="~/.config/snakemake/my_profile/profile.v9+.yaml"
If you instead provide the profile via the --profile command line argument, the $SNAKEMAKE_PROFILE environment variable will be ignored.
Using multiple global profiles
If multiple instances of the --profile command line argument are given, all the profiles are merged.
While merging, the profile instances specified later take precedence over earlier instances, wherever the same top-level entries occur in multiple profiles.
Take the following example files and their invocation.
/path/to/system_profile.yaml
executor: slurm
cores: 100
default-resources:
mem_mb: 1024
# some resource, where you can only use two at any time
rate_limiter: 2
~/.config/snakemake/user_profile/profile.yaml
cores: 50
default-resources:
mem_mb: 8000
When loaded in the order
snakemake --profile /path/to/system_profile.yaml --profile user_profile
this will lead to snakemake being run with the configuration
executor: slurm
cores: 50
default-resources:
mem_mb: 8000
Thus, the user_profile takes precedence over the system_profile.yaml.
As the user_profile does not specify an executor, executor: slurm is kept.
As the user_profile also specifies cores, its entry of cores: 50 overwrites the value from the system_profile.yaml.
And as you can see from the last example, this overwriting of entries happens at the top level of YAML entries.
As the user_profile specifies the top-level entry default-resources, this whole entry from system_profile.yaml is discarded and replaced by what is specified in user_profile.
Thus, the default-resources: rate_limiter: 2 entry is lost.
Workflow specific profiles take precedence over global profiles in the same way. And arguments specified on the command line take precedence over any profile YAML files.
Workflow specific profile
A workflow specific profile (introduced in Snakemake 7.29) is used to define default and rule-specific Resources specifications for a particular workflow.
Defining workflow specific profiles
Mostly, a workflow-specific profile is meant to set rule-specific resources and command-line arguments that are specific to the running of that workflow.
# rule-specific threads settings
set-threads:
# my_rule is set to threads: 4 in the rule, but we are trying out more
# parallelism on this instance. If this proves useful, we'll propagate
# this back to the rule definition.
my_rule: 8
# default non-threads resources and any arbitrary resources a workflow
# defines, can be controlled here
set-resources:
my_rule:
# This particular dataset seems to need more memory, while usually
# the mem_mb=80000 from the rule is enough. Maybe we can create a
# dynamic resource allocation based on the following issue: <link>
mem_mb: 400000
# This much memory is only available in this dedicated slurm partition.
slurm_partition: high_memory
# See docs on defining scatter-gather processes
set-scatter:
# For this dataset, the default scatter for this rule of 200 is far too
# much. We can optimise it by splitting into fewer but bigger chunks of
# data.
scatter_rule: 4
But a workflow specific profile can also overwrite default-resources from Global profiles, for example
# default resources that are assigned if a rule doesn't have mem_mb
# specified via its `resources:` directive.
default-resources:
# While a lower amount of memory might be a good default for other
# workflows, most rules in this one just need a higher amount.
mem_mb: 32000
So, as you can already see from the examples, most of the settings in these kinds of profiles should eventually be propagated back into (dynamic) resource settings for every individual rule (via the resources: directive).
But they are a very good tool for a number of purposes.
For example, to quickly change things on the fly, without having to change anything in the underlying workflow and waiting for another release.
Or, to distribute workflow profiles that optimise the rules’ resource usage of a workflow for a particular computing environment.
Especially for the latter, it can be useful to distribute workflow specific profiles along with the workflow itself.
For example, when the workflow has its Snakefile at workflow/Snakefile, a profile tailored to a particular xyz_cluster could be placed at workflow/profiles/xyz_cluster/profile.yaml and then used with --workflow-profile xyz_cluster.
Using workflow specific profiles
To make use of such a workflow specific profile, you can make Snakemake aware of it in one of two ways:
You give it a standardised filename (for example
profile.v9+.yaml, see the section on Profile YAML files) and save it in the default folder hierarchy relative to theSnakefileor the current working directory, usually eitherprofiles/default/profile.yamlorworkflow/profiles/default/profile.yaml.You use the command line argument
--workflow-profile.
Note that even without specifying --workflow-profile, Snakemake will automatically search for and apply a workflow profile in profiles/default/ (relative to the Snakefile or working directory).
To prevent any workflow profile from being loaded, you can explicitly call --workflow-profile none, as using the command line argument ensures that the default/ location is not searched implicitly (unless you explicitly specify --workflow-profile default).
Note
When using modules, the profile will not be propagated to the main workflow importing that module. However, using snakedeploy deploy-workflow to deploy a workflow as a module, will also copy any profiles included under the standard location workflow/profiles (for more info, see the snakedeploy documentation for deploying workflows). Starting from this import, or starting with a new file, users can create a profile for that main workflow.
Any profile you specify on the command line is searched in paths relative to the Snakefile location and the current working directory.
You have the same options to specify it as for Using global profiles:
With a profile name, which represents a subdirectory of one of the standard locations, and is assumed to contain a file called
profile.yaml(orconfig.yamlfor backwards compatibility).With a path to such a directory containing a
profile.yamlfile, relative to one of the standard locations.With a path to a YAML file with an arbitrary name, relative to one of the standard locations.
For example, if your Snakefile sits in the recommended location in subfolder workflow/, snakemake --workflow-profile my_profile will look for:
profiles/my_profile/profile.yaml
workflow/profiles/my_profile/profile.yaml
Note, the examples here omit the optional vX+ minimum version infix.
With the same Snakefile location, snakemake --workflow-profile relative_path/to/my_profile will look for:
relative_path/to/my_profile/profile.yaml
profiles/relative_path/to/my_profile/profile.yaml
workflow/profiles/relative_path/to/my_profile/profile.yaml
And finally, assuming that the specified file exists, snakemake --workflow-profile extra_profiles_dir/workflow_profile.yaml will short-circuit the lookup and just use the file that is specified.
Whenever a workflow profile is successfully specified, it is parsed after any global profiles. It takes precedence over them, overriding any pre-existing top-level keys that it also specifies, but keeping any top-level keys that it doesn’t contain.
For example, if the --profile global_profile YAML file sets
cores: 50
default-resources:
mem_mb: 8000
disk_mb: 20000
and the --workflow-profile workflow_on_xyz sets
default-resources:
mem_mb: 4000
extra: something
keep-going: True
the resulting profile configuration will be
cores: 50
default-resources:
mem_mb: 4000
extra: something
keep-going: True
Similarly, any specifications in your workflow specific profile will be overwritten by command line arguments of the snakemake run.
So, if you run snakemake --workflow-profile workfklow_on_xyz --default-resources mem_mb 1000 --cores 2, the resulting configuration will be:
cores: 2
default-resources:
mem_mb: 1000
keep-going: True
Use templating in profiles
In Snakemake 7.30 or newer, when the profile starts with
__use_yte__: true
It will be treated as a YTE template and parsed accordingly. This can be handy to e.g. define values inside of the profile that are based on environment variables. For example, admins could use this to define user-specific settings. Another application would be the uniform redefinition of resource requirements for a larger set of rules in a workflow profile (see above). However, it should be noted that templated profiles are harder to keep free of errors and the profile author has to make sure that they always work correctly for the user.
Visualization
To visualize the workflow, one can use the option --dag.
This creates a representation of the DAG in the graphviz dot language which has to be postprocessed by the graphviz tool dot.
E.g. to visualize the DAG that would be executed, you can issue:
$ snakemake --dag | dot | display
For saving this to a file, you can specify the desired format:
$ snakemake --dag | dot -Tpdf > dag.pdf
To visualize the whole DAG regardless of the eventual presence of files, the forceall option can be used:
$ snakemake --forceall --dag | dot -Tpdf > dag.pdf
Of course the visual appearance can be modified by providing further command line arguments to dot.
Note: The DAG is printed in DOT format straight to the standard output, along with other print statements you may have in your Snakefile. Make sure to comment these other print statements so that dot can build a visual representation of your DAG.
All Options
All command line options can be printed by calling snakemake -h.
Snakemake is a Python based language and execution environment for GNU Make-like workflows.
usage: snakemake [-h] [--dry-run] [--profile PROFILE]
[--workflow-profile WORKFLOW_PROFILE] [--cache [RULE ...]]
[--snakefile FILE] [--cores N] [--jobs N] [--local-cores N]
[--resources NAME=INT [NAME=INT ...]]
[--set-threads RULE=THREADS [RULE=THREADS ...]]
[--max-threads MAX_THREADS]
[--set-resources RULE:RESOURCE=VALUE [RULE:RESOURCE=VALUE ...]]
[--set-scatter NAME=SCATTERITEMS [NAME=SCATTERITEMS ...]]
[--set-resource-scopes RESOURCE=[global|local] [RESOURCE=[global|local] ...]]
[--default-resources [NAME=INT ...]]
[--preemptible-rules [PREEMPTIBLE_RULES ...]]
[--preemptible-retries PREEMPTIBLE_RETRIES]
[--configfile FILE [FILE ...]] [--config [KEY=VALUE ...]]
[--replace-workflow-config] [--envvars VARNAME [VARNAME ...]]
[--directory DIR] [--touch] [--keep-going]
[--rerun-triggers {code,input,mtime,params,software-env} [{code,input,mtime,params,software-env} ...]]
[--force] [--executor {local,dryrun,touch}] [--forceall]
[--forcerun [TARGET ...]]
[--consider-ancient RULE=INPUTITEMS [RULE=INPUTITEMS ...]]
[--prioritize TARGET [TARGET ...]]
[--batch RULE=BATCH/BATCHES] [--until TARGET [TARGET ...]]
[--omit-from TARGET [TARGET ...]] [--rerun-incomplete]
[--shadow-prefix DIR]
[--strict-dag-evaluation {cyclic-graph,functions,periodic-wildcards} [{cyclic-graph,functions,periodic-wildcards} ...]]
[--scheduler [{greedy,ilp}]]
[--conda-base-path CONDA_BASE_PATH] [--no-subworkflows]
[--precommand PRECOMMAND] [--groups GROUPS [GROUPS ...]]
[--group-components GROUP_COMPONENTS [GROUP_COMPONENTS ...]]
[--report [FILE]] [--report-after-run]
[--report-stylesheet CSSFILE] [--report-metadata FILE]
[--reporter PLUGIN] [--draft-notebook TARGET]
[--edit-notebook TARGET] [--notebook-listen IP:PORT]
[--lint [{text,json}]] [--generate-unit-tests [TESTPATH]]
[--containerize [{dockerfile,apptainer}]] [--export-cwl FILE]
[--list-rules] [--list-target-rules]
[--dag [{dot,mermaid-js}]] [--rulegraph [{dot,mermaid-js}]]
[--filegraph] [--d3dag] [--summary] [--detailed-summary]
[--archive FILE] [--cleanup-metadata FILE [FILE ...]]
[--cleanup-shadow] [--skip-script-cleanup] [--unlock]
[--list-changes {params,input,code}] [--list-input-changes]
[--list-params-changes] [--list-untracked]
[--delete-all-output | --delete-temp-output]
[--keep-incomplete] [--drop-metadata] [--version]
[--printshellcmds] [--debug-dag] [--nocolor]
[--quiet [{all,host,progress,reason,rules} ...]]
[--print-compilation] [--verbose] [--force-use-threads]
[--allow-ambiguity] [--nolock] [--ignore-incomplete]
[--max-inventory-time SECONDS] [--trust-io-cache]
[--max-checksum-file-size SIZE] [--latency-wait SECONDS]
[--wait-for-free-local-storage WAIT_FOR_FREE_LOCAL_STORAGE]
[--wait-for-files [FILE ...]] [--wait-for-files-file FILE]
[--runtime-source-cache-path PATH]
[--queue-input-wait-time SECONDS]
[--omit-flags OMIT_FLAGS [OMIT_FLAGS ...]] [--notemp]
[--all-temp] [--unneeded-temp-files FILE [FILE ...]]
[--keep-storage-local-copies] [--not-retrieve-storage]
[--target-files-omit-workdir-adjustment]
[--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]
[--max-jobs-per-timespan MAX_JOBS_PER_TIMESPAN]
[--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]
[--seconds-between-status-checks SECONDS_BETWEEN_STATUS_CHECKS]
[--retries RETRIES] [--wrapper-prefix WRAPPER_PREFIX]
[--default-storage-provider DEFAULT_STORAGE_PROVIDER]
[--default-storage-prefix DEFAULT_STORAGE_PREFIX]
[--local-storage-prefix LOCAL_STORAGE_PREFIX]
[--remote-job-local-storage-prefix REMOTE_JOB_LOCAL_STORAGE_PREFIX]
[--shared-fs-usage {input-output,persistence,software-deployment,software-deployment-cache,source-cache,sources,storage-local-copies,none} [{input-output,persistence,software-deployment,software-deployment-cache,source-cache,sources,storage-local-copies,none} ...]]
[--scheduler-greediness SCHEDULER_GREEDINESS]
[--scheduler-subsample SCHEDULER_SUBSAMPLE] [--no-hooks]
[--debug] [--runtime-profile FILE]
[--local-groupid LOCAL_GROUPID] [--attempt ATTEMPT]
[--show-failed-logs] [--logger {} [{} ...]]
[--job-deploy-sources] [--benchmark-extended]
[--persistence-backend {db,file}]
[--persistence-backend-db-url PERSISTENCE_BACKEND_DB_URL]
[--container-image IMAGE] [--immediate-submit]
[--jobscript SCRIPT] [--jobname NAME]
[--software-deployment-method {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]]
[--container-cleanup-images] [--use-conda]
[--conda-not-block-search-path-envvars] [--list-conda-envs]
[--conda-prefix DIR] [--conda-cleanup-envs]
[--conda-cleanup-pkgs [{tarballs,cache}]]
[--conda-create-envs-only] [--conda-frontend {conda,mamba}]
[--use-apptainer] [--apptainer-prefix DIR]
[--apptainer-args ARGS] [--use-envmodules]
[--deploy-sources QUERY CHECKSUM]
[--target-jobs TARGET_JOBS [TARGET_JOBS ...]]
[--mode {subprocess,default,remote}]
[--scheduler-solver-path SCHEDULER_SOLVER_PATH]
[--max-jobs-per-second MAX_JOBS_PER_SECOND]
[--report-html-path VALUE]
[--report-html-stylesheet-path VALUE]
[--scheduler-greedy-greediness VALUE]
[--scheduler-greedy-omit-prioritize-by-temp-and-input]
[--scheduler-ilp-solver VALUE]
[--scheduler-ilp-solver-path VALUE]
[targets ...]
EXECUTION
- targets
Targets to build. May be rules or files.
Default:
set()- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default:
False- --profile
Profile to use for configuring the Snakemake run with settings regarding the compute environment. Every key in this YAML file gets parsed into the respective command line argument: executor: slurm gets parsed to –executor slurm, default-resources: mem_mb: 16000 is interpreted as –default-resources mem_mb=16000, etc. You can specify a Snakemake profile as (i) a profile name, (ii) a relative path to a folder or (iii) the relative path to the profile YAML file itself. Snakemake will look for a folder with the profile name or the existence of the relative path in /etc/xdg/snakemake, /home/docs/.config/snakemake and the current working directory. Alternatively, you can also specify absolute paths. If a profile name or folder is given, it has to contain a file profile.yaml (or a config.yaml file, for backwards compatibility). This file can have an optional infix specifying a minimal snakemake version (for example profile.v9+.yaml). The profile can also be set via the environment variable $SNAKEMAKE_PROFILE. However, once you provide a profile via the command line argument –profile, this environment variable is ignored. And to override this variable without setting another one, provide the value none to this argument. Finally, you can specify this argument multiple times. In this case, the profiles get merged with the later –profile instances overriding top-level entries in profiles specified earlier. For example, if the last –profile specifies the top level default-resources: keyword, all entries under that keyword from previous `–profile`s will be ignored. Similarly, also specifying any of the top-level keys from your profile as a command line argument will overwrite this whole top-level key. Example profiles for certain compute infrastructure can be obtained at https://github.com/snakemake/snakemake-cluster-profiles.
- --workflow-profile
Profile to use for configuring this Snakemake run with parameters specific for this workflow (like resources). For settings specific to the compute environment (for example a specific compute cluster), use global –profile`s. Generally, an entry like `set-resources: a: mem_mb=8 in the YAML file, will become –set-resources a:mem_mb=8 for the snakemake run. The profile can be specified as a file name with a full relative path from the current working directory. In this case, the YAML profile file can be named arbitrarily. In all other cases the respective folder(s) will be searched for a profile.yaml file (or a config.yaml file, for backwards compatibility). This file can have an optional infix specifying a minimal snakemake version (for example profile.v9+.yaml). And any of the following options will always search relative to both the current working directory and the location of the Snakefile: (i) If this option is not provided, the directory profiles/default/ will be searched (and used, if a profile is present; override this implicit usage with –workflow-profile none). (ii) If a profile name is given, the subdirectory of that name under profiles/ will be searched. (iii) If a full relative path is given, this directory will be searched. Settings made in the workflow profile will override settings made in the general profile (see –profile) on a per-key basis. For example, if you specify default-resources: in the workflow profile, all default-resources: entries from other profiles will be ignored; but if you don’t specify default-resources in your workflow profile, default-resources from other profiles will get passed through. Similarly, also specifying any of the top-level keys from your workflow specific profile via command line arguments will completely overwrite their entries.
- --cache
Store output files of given rules in a central cache given by the environment variable $SNAKEMAKE_OUTPUT_CACHE. Likewise, retrieve output files of the given rules from this cache if they have been created before (by anybody writing to the same cache), instead of actually executing the rules. Output files are identified by hashing all steps, parameters and software stack (conda envs or containers) needed to create them. If no rules are given, all rules that are eligible for caching (have a cache directive, see docs) are cached.
- --snakefile, -s
The workflow definition in form of a snakefile. Usually, you should not need to specify this. By default, Snakemake will search for Snakefile, snakefile, workflow/Snakefile, workflow/snakefile beneath the current working directory, in this order. Only if you definitely want a different layout, you need to use this parameter.
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or all, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info.) This number is available to rules via workflow.cores.
- --jobs, -j
Use at most N CPU cluster/cloud jobs in parallel. For local execution this is an alias for –cores (it is though recommended to use –cores in that case). Note: Set to unlimited to allow any number of parallel jobs.
- --local-cores
In cluster/cloud mode, use at most N cores of the host machine in parallel (default: number of CPU cores of the host). The cores are used to execute local rules. This option is ignored when not in cluster/cloud mode.
- --resources, --res
Define additional resources that shall constrain the scheduling analogously to –cores (see above). A resource is defined as a name and an integer value. E.g. –resources mem_mb=1000. Rules can use resources by defining the resource keyword, e.g. resources: mem_mb=600. If now two rules require 600 of the resource mem_mb they won’t be run in parallel by the scheduler. In cluster/cloud mode, this argument will also constrain the amount of resources requested from the server. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info.)
Default:
{}- --set-threads
Overwrite thread usage of rules. This allows to fine-tune workflow parallelization. In particular, this is helpful to target certain cluster nodes by e.g. shifting a rule to use more, or less threads than defined in the workflow. Thereby, THREADS has to be a positive integer, and RULE has to be the name of the rule.
Default:
{}- --max-threads
Define a global maximum number of threads available to any rule. Rules requesting more threads (via the threads keyword) will have their values reduced to the maximum. This can be useful when you want to restrict the maximum number of threads without modifying the workflow definition or overwriting rules individually with –set-threads.
- --set-resources
Overwrite resource usage of rules. This allows to fine-tune workflow resources. In particular, this is helpful to target certain cluster nodes by e.g. defining a certain partition for a rule, or overriding a temporary directory. Thereby, VALUE has to be a positive integer or a string, RULE has to be the name of the rule, and RESOURCE has to be the name of the resource.
Default:
{}- --set-scatter
Overwrite number of scatter items of scattergather processes. This allows to fine-tune workflow parallelization. Thereby, SCATTERITEMS has to be a positive integer, and NAME has to be the name of the scattergather process defined via a scattergather directive in the workflow.
Default:
{}- --set-resource-scopes
Overwrite resource scopes. A scope determines how a constraint is reckoned in cluster execution. With RESOURCE=local, a constraint applied to RESOURCE using –resources will be considered the limit for each group submission. With RESOURCE=global, the constraint will apply across all groups cumulatively. By default, only mem_mb and disk_mb are considered local, all other resources are global. This may be modified in the snakefile using the resource_scopes: directive. Note that number of threads, specified via –cores, is always considered local. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)
Default:
{}- --default-resources, --default-res
Define default values of resources for rules that do not define their own values. In addition to plain integers, python expressions over inputsize are allowed (e.g. 2*input.size_mb). The inputsize is the sum of the sizes of all input files of a rule. By default, Snakemake assumes a default for mem_mb, disk_mb, and tmpdir (see below). This option allows to add further defaults (e.g. account and partition for slurm) or to overwrite these default values. The defaults are mem_mb=min(max(2*input.size_mb, 1000), 8000), disk_mb=max(2*input.size_mb, 1000) (i.e., default disk and mem usage is twice the input file size but at least 1GB), and the system temporary directory (as given by $TMPDIR, $TEMP, or $TMP) is used for the tmpdir resource. The tmpdir resource is automatically used by shell commands, scripts and wrappers to store temporary data (as it is mirrored into $TMPDIR, $TEMP, and $TMP for the executed subprocesses). If this argument is not specified at all, Snakemake just uses the tmpdir resource as outlined above. The tmpdir resource can also be overwritten in the same way as e.g. mem_mb above. Thereby, it is even possible to use shutil.disk_usage(system_tmpdir).free and comparing this to input.size in order to determine if one can expect the system_tmpdir to be big enough and switch to another tmpdir in case it is not.
- --preemptible-rules
Define which rules shall use a preemptible machine which can be prematurely killed by e.g. a cloud provider (also called spot instances). This is currently only supported by the Google Life Sciences executor and ignored by all other executors. If no rule names are provided, all rules are considered to be preemptible.
- --preemptible-retries
Number of retries that shall be made in order to finish a job from of rule that has been marked as preemptible via the –preemptible-rules setting.
- --configfile, --configfiles
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
Default:
[]- --config, -C
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a YAML JSON file (see –configfile and Documentation). Nested values must be defined in Python dict format, e.g., –config “foo={‘bar’: 42}”.
- --replace-workflow-config
Config files provided via command line do not update and extend the config dictionary of the workflow but instead fully replace it. Keys that are not defined in the provided config files will be undefined even if specified within the workflow config.
Default:
False- --envvars
Environment variables to pass to cloud jobs.
Default:
set()- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you lose the provenance information when the files have been created in reality. Hence, this should be used only as a last resort.
Default:
False- --keep-going, -k
Go on with independent jobs if a job fails during execution. This only applies to runtime failures in job execution, not to errors during workflow parsing or DAG construction.
Default:
False- --rerun-triggers
Possible choices: code, input, mtime, params, software-env
Define what triggers the rerunning of a job. By default, all triggers are used, which guarantees that results are consistent with the workflow code and configuration. If you rather prefer the traditional way of just considering file modification dates, use –rerun-trigger mtime.
Default:
frozenset({<RerunTrigger.INPUT: 2>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.MTIME: 0>, <RerunTrigger.CODE: 4>, <RerunTrigger.SOFTWARE_ENV: 3>})- --force, -f
Force the execution of the selected target or the first rule regardless of already created output.
Default:
False- --executor, -e
Possible choices: local, dryrun, touch
Specify a custom executor, available via an executor plugin: snakemake_executor_<name>
- --forceall, -F
Force the execution of the selected (or the first) rule and all rules it is dependent on regardless of already created output.
Default:
False- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
Default:
set()- --consider-ancient
Consider given input items of given rules as ancient, i.e. not triggering re-runs if they are newer than the output files. Putting this into a workflow specific profile (or specifying as argument) allows to overrule rerun triggers caused by file modification dates where the user knows better. RULE is the name of the rule, INPUTITEMS is a comma separated list of input items of the rule (given as name or index (0-based)).
Default:
{}- --prioritize, -P
Tell the scheduler to assign creation of given targets (and all their dependencies) highest priority.
Default:
set()- --batch
Only create the given BATCH of the input files of the given RULE. This can be used to iteratively run parts of very large workflows. Only the execution plan of the relevant part of the workflow has to be calculated, thereby speeding up DAG computation. It is recommended to provide the most suitable rule for batching when documenting a workflow. It should be some aggregating rule that would be executed only once, and has a large number of input files. For example, it can be a rule that aggregates over samples.
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
Default:
set()- --omit-from, -O
Prevent the execution or creation of the given rules or files as well as any rules or files that are downstream of these targets in the DAG. Also runs jobs in sibling DAGs that are independent of the rules or files specified here.
Default:
set()- --rerun-incomplete, --ri
Re-run all jobs the output of which is recognized as incomplete.
Default:
False- --shadow-prefix
Specify a directory in which the shadow directory is created. If not supplied, the value is set to the .snakemake directory relative to the working directory.
- --strict-dag-evaluation
Possible choices: cyclic-graph, functions, periodic-wildcards
Strict evaluation of rules’ correctness even when not required to produce the output files.
Default:
set()- --scheduler
Possible choices: greedy, ilp
Specifies the scheduling plugin to use. Builtin plugins are greedy (fast) and ilp, while the latter scheduler aims to reduce runtime and hdd usage by best possible use of resources.
Default:
'ilp'- --conda-base-path
Path of conda base installation (home of conda, mamba, activate) (internal use only).
- --no-subworkflows, --nosw
Do not evaluate or execute subworkflows.
Default:
False- --precommand
Only used in case of remote execution. Command to be executed before Snakemake executes each job on the remote compute node.
- --dag
Possible choices: dot, mermaid-js
Do not execute anything and print the directed acyclic graph of jobs in the dot language or in mermaid-js. Recommended use on Unix systems: snakemake –dag | dot | display. Note print statements in your Snakefile may interfere with visualization.
GROUPING
- --groups
Assign rules to groups (this overwrites any group definitions from the workflow).
Default:
{}- --group-components
Set the number of connected components a group is allowed to span. By default, this is 1, but this flag allows to extend this. This can be used to run e.g. 3 jobs of the same rule in the same group, although they are not connected. It can be helpful for putting together many small jobs or benefitting of shared memory setups.
Default:
{}
REPORTS
- --report
Create a self-contained HTML report with default statistics, provenance information and user-specified results. For smaller datasets with a limited report complexity, you can specify an .html file and all results will be embedded directly into this file. For customized reports on larger sample sizes, it makes more sense to specify a .zip file. The resulting archive will spread the contents across a folder structure, for a quicker loading of individual results. You can unpack this archive anywhere and open the report.html file in its main folder to view the report in any web browser.
- --report-after-run
After finishing the workflow, directly create the report. It is required to provide –report.
Default:
False- --report-stylesheet
Custom stylesheet to use for report. In particular, this can be used for branding the report with e.g. a custom logo, see docs.
- --report-metadata
Custom metadata to use for the landing page of the report. In particular, this can be used to provide metadata in the report e.g. the work directory, see docs.
- --reporter
Specify a custom report plugin. By default, Snakemake’s builtin html reporter will be used. For custom reporters, check out their command line options starting with –report-.
NOTEBOOKS
- --draft-notebook
Draft a skeleton notebook for the rule used to generate the given target file. This notebook can then be opened in a jupyter server, executed and implemented until ready. After saving, it will automatically be reused in non-interactive mode by Snakemake for subsequent jobs.
- --edit-notebook
Interactively edit the notebook associated with the rule used to generate the given target file. This will start a local jupyter notebook server. Any changes to the notebook should be saved, and the server has to be stopped by closing the notebook and hitting the Quit button on the jupyter dashboard. Afterwards, the updated notebook will be automatically stored in the path defined in the rule. If the notebook is not yet present, this will create an empty draft.
- --notebook-listen
The IP address and PORT the notebook server used for editing the notebook (–edit-notebook) will listen on.
Default:
'localhost:8888'
UTILITIES
- --lint
Possible choices: text, json
Perform linting on the given workflow. This will print snakemake specific suggestions to improve code quality (work in progress, more lints to be added in the future). If no argument is provided, plain text output is used.
- --generate-unit-tests
Automatically generate unit tests for each workflow rule. This assumes that all input files of each job are already present. Jobs without present input files will be skipped (a warning will be issued). For each rule, one test case will be created and, after successful execution, tests can be run with pytest TESTPATH.
- --containerize
Possible choices: dockerfile, apptainer
Print a container definition that provides an execution environment for the workflow, including all conda environments. Supported formats: dockerfile (default), apptainer.
- --export-cwl
Compile workflow to CWL and store it in given FILE.
- --list-rules, --list, -l
Show available rules in given Snakefile.
Default:
False- --list-target-rules, --lt
Show available target rules in given Snakefile.
Default:
False- --rulegraph
Possible choices: dot, mermaid-js
Do not execute anything and print the dependency graph of rules in the dot language or in mermaid-js. This will be less crowded than above DAG of jobs, but also show less information. Note that each rule is displayed once, hence the displayed graph will be cyclic if a rule appears in several steps of the workflow. Use this if above option leads to a DAG that is too large. Recommended use on Unix systems: snakemake –rulegraph | dot | display. Note print statements in your Snakefile may interfere with visualization.
- --filegraph
Do not execute anything and print the dependency graph of rules with their input and output files in the dot language. This is an intermediate solution between above DAG of jobs and the rule graph. Note that each rule is displayed once, hence the displayed graph will be cyclic if a rule appears in several steps of the workflow. Use this if above option leads to a DAG that is too large. Recommended use on Unix systems: snakemake –filegraph | dot | display. Note print statements in your Snakefile may interfere with visualization.
Default:
False- --d3dag
Print the DAG in D3.js compatible JSON format.
Default:
False- --summary, -S
Print a summary of all files created by the workflow. The has the following columns: filename, modification time, rule version, status, plan. Thereby rule version contains the version the file was created with (see the version keyword of rules), and status denotes whether the file is missing, its input files are newer or if version or implementation of the rule changed since file creation. Finally the last column denotes whether the file will be updated or created during the next workflow execution.
Default:
False- --detailed-summary, -D
Print a summary of all files created by the workflow. The has the following columns: filename, modification time, rule version, input file(s), shell command, status, plan. Thereby rule version contains the version the file was created with (see the version keyword of rules), and status denotes whether the file is missing, its input files are newer or if version or implementation of the rule changed since file creation. The input file and shell command columns are self explanatory. Finally the last column denotes whether the file will be updated or created during the next workflow execution.
Default:
False- --archive
Archive the workflow into the given tar archive FILE. The archive will be created such that the workflow can be re-executed on a vanilla system. The function needs conda and git to be installed. It will archive every file that is under git version control. Note that it is best practice to have the Snakefile, config files, and scripts under version control. Hence, they will be included in the archive. Further, it will add input files that are not generated by by the workflow itself and conda environments. Note that symlinks are dereferenced. Supported formats are .tar, .tar.gz, .tar.bz2 and .tar.xz.
- --cleanup-metadata, --cm
Cleanup the metadata of given files. That means that snakemake removes any tracked version info, and any marks that files are incomplete.
- --cleanup-shadow
Cleanup old shadow directories which have not been deleted due to failures or power loss.
Default:
False- --skip-script-cleanup
Don’t delete wrapper scripts used for execution
Default:
False- --unlock
Remove a lock on the working directory.
Default:
False- --list-changes, --lc
Possible choices: params, input, code
List all output files for which the given items (code, input, params) have changed since creation.
- --list-input-changes, --li
List all output files for which the defined input files have changed in the Snakefile (e.g. new input files were added in the rule definition or files were renamed). For listing input file modification in the filesystem, use –summary.
Default:
False- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default:
False- --list-untracked, --lu
List all files in the working directory that are not used in the workflow. This can be used e.g. for identifying leftover files. Hidden files and directories are ignored.
Default:
False- --delete-all-output
Remove all files generated by the workflow. Use together with –dry-run to list files without actually deleting anything. Note that this will not recurse into subworkflows. Write-protected files are not removed. Nevertheless, use with care!
Default:
False- --delete-temp-output
Remove all temporary files generated by the workflow. Use together with –dry-run to list files without actually deleting anything. Note that this will not recurse into subworkflows.
Default:
False- --keep-incomplete
Do not remove incomplete output files by failed jobs.
Default:
False- --drop-metadata
Drop metadata file tracking information after job finishes. Provenance-information based reports (e.g. –report and the –list_x_changes functions) will be empty or incomplete.
Default:
False- --version, -v
show program’s version number and exit
OUTPUT
- --printshellcmds, -p
Print out the shell commands that will be executed.
Default:
False- --debug-dag
Print candidate and selected jobs (including their wildcards) while inferring DAG. This can help to debug unexpected DAG topology or errors.
Default:
False- --nocolor
Do not use a colored output.
Default:
False- --quiet, -q
Possible choices: all, host, progress, reason, rules
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining all results in no information being printed at all.
- --print-compilation
Print the python representation of the workflow.
Default:
False- --verbose
Print debugging output.
Default:
False
BEHAVIOR
- --force-use-threads
Force threads rather than processes. Helpful if shared memory (/dev/shm) is full or unavailable.
Default:
False- --allow-ambiguity, -a
Don’t check for ambiguous rules and simply use the first if several can produce the same file. This allows the user to prioritize rules by their order in the snakefile.
Default:
False- --nolock
Do not lock the working directory
Default:
False- --ignore-incomplete, --ii
Do not check for incomplete output files.
Default:
False- --max-inventory-time
Spend at most SECONDS seconds to create a file inventory for the working directory. The inventory vastly speeds up file modification and existence checks when computing which jobs need to be executed. However, creating the inventory itself can be slow, e.g. on network file systems. Hence, we do not spend more than a given amount of time and fall back to individual checks for the rest.
Default:
20- --trust-io-cache
Tell Snakemake to assume that all input and output file existence and modification time queries performed in previous dryruns are still valid and therefore don’t have to be repeated. This can lead to speed-ups, but implies that input and output have not been modified manually in between. Non dry-run execution will automatically invalidate the cache and lead to redoing the queries.
Default:
False- --max-checksum-file-size
Compute the checksum during DAG computation and job postprocessing only for files that are smaller than the provided threshold (given in any valid size unit, e.g. 1MB, which is also the default).
Default:
1000000- --latency-wait, --output-wait, -w
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency.
Default:
5- --wait-for-free-local-storage
Wait for given timespan for enough free local storage when downloading remote storage files. If not set, no waiting is performed.
- --wait-for-files
Wait –latency-wait seconds for these files to be present before executing the workflow. This option is used internally to handle filesystem latency in cluster environments.
- --wait-for-files-file
Same behaviour as –wait-for-files, but file list is stored in file instead of being passed on the commandline. This is useful when the list of files is too long to be passed on the commandline. Meant for internal use.
- --runtime-source-cache-path
Path to the runtime source cache directory. Meant for internal use.
- --queue-input-wait-time
Set the interval in seconds to check for new input in rules that use from_queue to obtain input files.
Default:
10- --omit-flags
Omit the given input and output file flags (e.g. pipe). This can be useful for debugging.
Default:
frozenset()- --notemp, --no-temp, --nt
Ignore temp() declarations. This is useful when running only a part of the workflow, since temp() would lead to deletion of probably needed files by other parts of the workflow.
Default:
False- --all-temp
Mark all output files as temp files. This can be useful for CI testing, in order to save space.
Default:
False- --unneeded-temp-files
Given files will not be uploaded to storage and immediately deleted after job or group job completion.
Default:
frozenset()- --keep-storage-local-copies
Keep local copies of remote input and output files.
Default:
False- --not-retrieve-storage
Do not retrieve remote files (default is to retrieve remote files).
Default:
False- --target-files-omit-workdir-adjustment
Do not adjust the paths of given target files relative to the working directory.
Default:
False- --allowed-rules
Only consider given rules. If omitted, all rules in Snakefile are used. Note that this is intended primarily for internal use and may lead to unexpected results otherwise. Meant for internal use or debugging.
- --max-jobs-per-timespan
Maximal number of job submissions/executions per timespan. Format: <number><timespan>, e.g. 50/1m or 0.5/1s.
Default:
100/1s- --max-status-checks-per-second
Maximal number of job status checks per second; fractions allowed.
Default:
10- --seconds-between-status-checks
Number of seconds to wait between two rounds of status checks.
Default:
10- --retries, --restart-times, -T
Number of times to restart failing jobs.
Default:
0- --wrapper-prefix
URL prefix for wrapper directive. Set this to use your fork or a local clone of the repository, e.g., use a git URL like git+file://path/to/your/local/clone@.
- --default-storage-provider
Specify default storage provider to be used for all input and output files that don’t yet specify one (e.g. s3). See https://snakemake.github.io/snakemake-plugin-catalog for available storage provider plugins. If not set or explicitly none, no default storage provider will be used.
- --default-storage-prefix
Specify prefix for default storage provider. E.g. a bucket name.
Default:
''- --local-storage-prefix
Specify prefix for storing local copies of storage files and folders (e.g. local scratch disk). Environment variables will be expanded.
Default:
.snakemake/storage- --remote-job-local-storage-prefix
Specify prefix for storing local copies of storage files and folders (e.g. local scratch disk) in case of remote jobs (e.g. cluster or cloud jobs). Environment variables will be expanded within the remote job.
Default:
.snakemake/storage- --shared-fs-usage
Possible choices: input-output, persistence, software-deployment, software-deployment-cache, source-cache, sources, storage-local-copies, none
Set assumptions on shared filesystem for non-local workflow execution. To disable any sharing via the filesystem, specify none. Usually, the executor plugin sets this to a correct default. However, sometimes it is worth tuning this setting, e.g. for optimizing cluster performance. For example, when using –default-storage-provider fs together with a cluster executor like slurm, you might want to set –shared-fs-usage persistence software-deployment sources source-cache, such that software deployment and data provenance will be handled by NFS but input and output files will be handled exclusively by the storage provider.
Default:
frozenset({<SharedFSUsage.STORAGE_LOCAL_COPIES: 4>, <SharedFSUsage.PERSISTENCE: 0>, <SharedFSUsage.SOURCE_CACHE: 5>, <SharedFSUsage.SOFTWARE_DEPLOYMENT: 2>, <SharedFSUsage.INPUT_OUTPUT: 1>, <SharedFSUsage.SOFTWARE_DEPLOYMENT_CACHE: 6>, <SharedFSUsage.SOURCES: 3>})- --scheduler-greediness, --greediness
Set the greediness of scheduling. This value between 0 and 1 determines how careful jobs are selected for execution. The default value (1.0) provides the best speed and still acceptable scheduling quality. Deprecated in favor of –scheduler-greedy-greediness.
Default:
1.0- --scheduler-subsample
Set the number of jobs to be considered for scheduling. If number of ready jobs is greater than this value, this number of jobs is randomly chosen for scheduling; if number of ready jobs is lower, this option has no effect. This can be useful on very large DAGs, where the scheduler can take some time selecting which jobs to run.
- --no-hooks
Do not invoke onstart, onsuccess or onerror hooks after execution.
Default:
False- --debug
Allow to debug rules with e.g. PDB. This flag allows to set breakpoints in run blocks.
Default:
False- --runtime-profile
Profile Snakemake and write the output to FILE. This requires yappi to be installed.
- --local-groupid
Internal use only: Name for local groupid.
Default:
'local'- --attempt
Internal use only: define the initial value of the attempt parameter.
Default:
1- --show-failed-logs
Automatically display logs of failed jobs.
Default:
False- --logger
Specify one or more custom loggers, available via logger plugins.
Default:
[]- --job-deploy-sources
Whether the workflow sources shall be deployed before a remote job is started. Only applies if –no-shared-fs is set or executors are used that imply no shared FS (e.g. the kubernetes executor).
Default:
False- --benchmark-extended
Write extended benchmarking metrics.
Default:
False- --persistence-backend
Possible choices: db, file
The backend to use for Snakemake’s metadata persistence. The ‘file’ backend uses a file system directory structure. The ‘db’ backend uses a relational database via SQLAlchemy.
Default:
file- --persistence-backend-db-url
The database URL to use for the ‘db’ persistence backend (e.g., ‘sqlite:///.snakemake/metadata.db’, ‘postgresql://user@host/db’). Only used if –persistence-backend is ‘db’.
REMOTE EXECUTION
- --container-image
Docker image to use, e.g., when submitting jobs to kubernetes. Defaults to https://hub.docker.com/r/snakemake/snakemake, tagged with the same version as the currently running Snakemake instance. Note that overwriting this value is up to your responsibility. Any used image has to contain a working snakemake installation that is compatible with (or ideally the same as) the currently running version.
Default:
'snakemake/snakemake:v9.22.0'- --immediate-submit, --is
Immediately submit all jobs to the cluster instead of waiting for present input files. This will fail, unless you make the cluster aware of job dependencies, e.g. via: $ snakemake –cluster ‘sbatch –dependency {dependencies}’. Assuming that your submit script (here sbatch) outputs the generated job id to the first stdout line, {dependencies} will be filled with space separated job ids this job depends on. Does not work for workflows that contain checkpoint rules, and localrules will be skipped. The additional argument –notemp should be specified. Most often, –not-retrieve-storage is also recommended to avoid Snakemake trying to download output files before the jobs producing them are executed.
Default:
False- --jobscript, --js
Provide a custom job script for submission to the cluster. The default script resides as jobscript.sh in the installation directory.
- --jobname, --jn
Provide a custom name for the jobscript that is submitted to the cluster (see –cluster). The wildcard {jobid} has to be present in the name.
Default:
'snakejob.{name}.{jobid}.sh'
SOFTWARE DEPLOYMENT
- --software-deployment-method, --deployment-method, --deployment, --sdm
Possible choices: apptainer, conda, env-modules
Specify software environment deployment method.
Default:
set()- --container-cleanup-images
Remove unused containers
Default:
False
CONDA
- --use-conda
If defined in the rule, run job in a conda environment. If this flag is not set, the conda directive is ignored.
Default:
False- --conda-not-block-search-path-envvars
Do not block environment variables that modify the search path (R_LIBS, PYTHONPATH, PERL5LIB, PERLLIB) when using conda environments.
Default:
False- --list-conda-envs
List all conda environments and their location on disk.
Default:
False- --conda-prefix
Specify a directory in which the conda and conda-archive directories are created. These are used to store conda environments and their archives, respectively. If not supplied, the value is set to the .snakemake directory relative to the invocation directory. If supplied, the –use-conda flag must also be set. The value may be given as a relative path, which will be extrapolated to the invocation directory, or as an absolute path. The value can also be provided via the environment variable $SNAKEMAKE_CONDA_PREFIX. In any case, the prefix may contain environment variables which will be properly expanded. Note that if you use remote execution e.g. on a cluster and you have node specific values for this, you should disable assuming shared fs for software-deployment (see –shared-fs-usage).
- --conda-cleanup-envs
Cleanup unused conda environments.
Default:
False- --conda-cleanup-pkgs
Possible choices: tarballs, cache
Cleanup conda packages after creating environments. In case of tarballs mode, will clean up all downloaded package tarballs. In case of cache mode, will additionally clean up unused package caches.
Default:
tarballs- --conda-create-envs-only
If specified, only creates the job-specific conda environments then exits. The –use-conda flag must also be set.
Default:
False- --conda-frontend
Possible choices: conda, mamba
Choose the conda frontend for installing environments.
Default:
'conda'
APPTAINER/SINGULARITY
- --use-apptainer, --use-singularity
If defined in the rule, run job within a apptainer/singularity container. If this flag is not set, the singularity directive is ignored.
Default:
False- --apptainer-prefix, --singularity-prefix
Specify a directory in which apptainer/singularity images will be stored.If not supplied, the value is set to the .snakemake directory relative to the invocation directory. If supplied, the –use-apptainer flag must also be set. The value may be given as a relative path, which will be extrapolated to the invocation directory, or as an absolute path. If not supplied, APPTAINER_CACHEDIR is used. In any case, the prefix may contain environment variables which will be properly expanded. Note that if you use remote execution e.g. on a cluster and you have node specific values for this, you should disable assuming shared fs for software-deployment (see –shared-fs-usage).
- --apptainer-args, --singularity-args
Pass additional args to apptainer/singularity.
Default:
''
ENVIRONMENT MODULES
- --use-envmodules
If defined in the rule, run job within the given environment modules, loaded in the given order. This can be combined with –use-conda and –use-singularity, which will then be only used as a fallback for rules which don’t define environment modules.
Default:
False
INTERNAL
- --deploy-sources
Internal use only: Deploy sources archive from given storage provider query to the current working subdirectory and control for archive checksum to proceed.
- --target-jobs
Internal use only: Target particular jobs by RULE:WILDCARD1=VALUE,WILDCARD2=VALUE,…
Default:
set()- --mode
Possible choices: subprocess, default, remote
Internal use only: Set execution mode of Snakemake.
Default:
default- --scheduler-solver-path
Internal use only: Set the PATH to search for scheduler solver binaries. Deprecated, use –scheduler-ilp-solver-path instead.
DEPRECATED
- --max-jobs-per-second
Maximal number of job submissions/executions per second. Deprecated in favor of –max-jobs-per-timespan.
html report plugin settings
- --report-html-path
Path to the report file (either .html or .zip). Use zip if your report contains large results or directories with htmlindex as results.
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>- --report-html-stylesheet-path
Path to a custom stylesheet for the report.
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>
greedy scheduler plugin settings
- --scheduler-greedy-greediness
Set the greediness of scheduling. This value between 0 and 1 determines how careful jobs are selected for execution. The default value (1.0) provides the best speed and still acceptable scheduling quality.
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>- --scheduler-greedy-omit-prioritize-by-temp-and-input
If set, jobs with larger temporary or input files are not prioritized. The rationale of the prioritization is that temp files should be removed as soon as possible, and larger input files may take longer to process, so it is better to start them earlier.
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>
ilp scheduler plugin settings
- --scheduler-ilp-solver
Possible choices: PULP_CBC_CMD
Set MILP solver to use
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>- --scheduler-ilp-solver-path
Set the PATH to search for scheduler solver binaries.
Default:
<dataclasses._MISSING_TYPE object at 0x7ae1a10b34d0>