Additional utils

class snakemake.utils.AlwaysQuotedFormatter(quote_func=None, *args, **kwargs)[source]

Subclass of QuotedFormatter that always quotes.

Usage is identical to QuotedFormatter, except that it always acts like “q” was appended to the format spec, unless u (for unquoted) is appended.

class snakemake.utils.Paramspace(dataframe, filename_params=None, param_sep='~')[source]

A wrapper for pandas dataframes that provides helpers for using them as a parameter space in Snakemake.

This is heavily inspired by @soumitrakp work on JUDI (https://github.com/ncbi/JUDI).

By default, a directory structure with on folder level per parameter is created (e.g. column1~{column1}/column2~{column2}/***).

The exact behavior can be tweeked with two parameters:

  • filename_params takes a list of column names of the passed dataframe. These names are used to build the filename (separated by ‘_’) in the order in which they are passed. All remaining parameters will be used to generate a directoty structure. Example for a data frame with four columns named column1 to column4:

    Paramspace(df, filename_params=["column3", "column2"]) ->
    column1~{value1}/column4~{value4}/column3~{value3}_column2~{value2}
  • param_sep takes a string which is used to join the column name and column value in the genrated paths (Default: ‘~’). Example:

    Paramspace(df, param_sep=":") ->
    column1:{value1}/column2:{value2}/column3:{value3}/column4:{value4}
instance(wildcards)[source]

Obtain instance (dataframe row) with the given wildcard values.

instance_patterns

Iterator over all instances of the parameter space (dataframe rows), formatted as file patterns of the form column1~{value1}/column2~{value2}/… or of the provided custom pattern.

wildcard_pattern

Wildcard pattern over all columns of the underlying dataframe of the form column1~{column1}/column2~{column2}/*** or of the provided custom pattern.

class snakemake.utils.QuotedFormatter(quote_func=None, *args, **kwargs)[source]

Subclass of string.Formatter that supports quoting.

Using this formatter, any field can be quoted after formatting by appending “q” to its format string. By default, shell quoting is performed using “shlex.quote”, but you can pass a different quote_func to the constructor. The quote_func simply has to take a string argument and return a new string representing the quoted form of the input string.

Note that if an element after formatting is the empty string, it will not be quoted.

snakemake.utils.R(code)[source]

Execute R code.

This is deprecated in favor of the script directive. This function executes the R code given as a string. The function requires rpy2 to be installed.

Parameters:code (str) – R code to be executed
class snakemake.utils.SequenceFormatter(separator=' ', element_formatter=<string.Formatter object>, *args, **kwargs)[source]

string.Formatter subclass with special behavior for sequences.

This class delegates formatting of individual elements to another formatter object. Non-list objects are formatted by calling the delegate formatter’s “format_field” method. List-like objects (list, tuple, set, frozenset) are formatted by formatting each element of the list according to the specified format spec using the delegate formatter and then joining the resulting strings with a separator (space by default).

format_element(elem, format_spec)[source]

Format a single element

For sequences, this is called once for each element in a sequence. For anything else, it is called on the entire object. It is intended to be overridden in subclases.

snakemake.utils.argvquote(arg, force=True)[source]

Returns an argument quoted in such a way that that CommandLineToArgvW on Windows will return the argument string unchanged. This is the same thing Popen does when supplied with an list of arguments. Arguments in a command line should be separated by spaces; this function does not add these spaces. This implementation follows the suggestions outlined here: https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

snakemake.utils.available_cpu_count()[source]

Return the number of available virtual or physical CPUs on this system. The number of available CPUs can be smaller than the total number of CPUs when the cpuset(7) mechanism is in use, as is the case on some cluster systems.

Adapted from https://stackoverflow.com/a/1006301/715090

snakemake.utils.format(_pattern, *args, stepout=1, _quote_all=False, **kwargs)[source]

Format a pattern in Snakemake style.

This means that keywords embedded in braces are replaced by any variable values that are available in the current namespace.

snakemake.utils.linecount(filename)[source]

Return the number of lines of given file.

Parameters:filename (str) – the path to the file
snakemake.utils.listfiles(pattern, restriction=None, omit_value=None)[source]

Yield a tuple of existing filepaths for the given pattern.

Wildcard values are yielded as the second tuple item.

Parameters:
  • pattern (str) – a filepattern. Wildcards are specified in snakemake syntax, e.g. “{id}.txt”
  • restriction (dict) – restrict to wildcard values given in this dictionary
  • omit_value (str) – wildcard value to omit
Yields:

tuple – The next file matching the pattern, and the corresponding wildcards object

snakemake.utils.makedirs(dirnames)[source]

Recursively create the given directory or directories without reporting errors if they are present.

snakemake.utils.min_version(version)[source]

Require minimum snakemake version, raise workflow error if not met.

snakemake.utils.os_sync()[source]

Ensure flush to disk

snakemake.utils.read_job_properties(jobscript, prefix='# properties', pattern=re.compile('# properties = (.*)'))[source]

Read the job properties defined in a snakemake jobscript.

This function is a helper for writing custom wrappers for the snakemake –cluster functionality. Applying this function to a jobscript will return a dict containing information about the job.

snakemake.utils.report(text, path, stylesheet=None, defaultenc='utf8', template=None, metadata=None, **files)[source]

Create an HTML report using python docutils.

This is deprecated in favor of the –report flag.

Attention: This function needs Python docutils to be installed for the python installation you use with Snakemake.

All keywords not listed below are intepreted as paths to files that shall be embedded into the document. They keywords will be available as link targets in the text. E.g. append a file as keyword arg via F1=input[0] and put a download link in the text like this:

report('''
==============
Report for ...
==============

Some text. A link to an embedded file: F1_.

Further text.
''', outputpath, F1=input[0])

Instead of specifying each file as a keyword arg, you can also expand
the input of your rule if it is completely named, e.g.:

report('''
Some text...
''', outputpath, **input)
Parameters:
  • text (str) – The “restructured text” as it is expected by python docutils.
  • path (str) – The path to the desired output file
  • stylesheet (str) – An optional path to a css file that defines the style of the document. This defaults to <your snakemake install>/report.css. Use the default to get a hint how to create your own.
  • defaultenc (str) – The encoding that is reported to the browser for embedded text files, defaults to utf8.
  • template (str) – An optional path to a docutils HTML template.
  • metadata (str) – E.g. an optional author name or email address.
snakemake.utils.simplify_path(path)[source]

Return a simplified version of the given path.

snakemake.utils.update_config(config, overwrite_config)[source]

Recursively update dictionary config with overwrite_config.

See https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth for details.

Parameters:
  • config (dict) – dictionary to update
  • overwrite_config (dict) – dictionary whose items will overwrite those in config
snakemake.utils.validate(data, schema, set_default=True)[source]

Validate data with JSON schema at given path.

Parameters:
  • data (object) – data to validate. Can be a config dict or a pandas data frame.
  • schema (str) – Path to JSON schema used for validation. The schema can also be in YAML format. If validating a pandas data frame, the schema has to describe a row record (i.e., a dict with column names as keys pointing to row values). See https://json-schema.org. The path is interpreted relative to the Snakefile when this function is called.
  • set_default (bool) – set default values defined in schema. See https://python-jsonschema.readthedocs.io/en/latest/faq/ for more information