Front End Reference

A simple consistency check on a site can be done by doing the following with the Python API when an instance of dynamo is installed:

from dynamo_consistency import config, datatypes, remotelister, inventorylister

config.LOCATION = '/path/to/config.json'
site = 'T2_US_MIT'                        # For example

inventory_listing = inventorylister.listing(site)
remote_listing = remotelister.listing(site)

datatypes.compare(inventory_listing, remote_listing, 'results')

In this example, the list of file LFNs in the inventory and not at the site will be in results_missing.txt. The list of file LFNs at the site and not in the inventory will be in results_orphan.txt. The listing functions can be re-implemented to perform the check desired. This is detailed more in Back End Requirements.

The following is a full reference to the submodules directly inside of the dynamo_consistency module. These are the modules indended to be interacted with by a typical user. To see more details about the backend connections to dynamo and remote sites, see Back End Requirements.

cache.py

dynamo_consistency.cache.cache_tree(config_age, location_suffix)[source]

A decorator for caching pickle files based on the configuration file. It is currently set up to decorate a function that has a single parameter site.

The returned function also can be passed keyword arguments to override the location_suffix argument. This is done with the cache argument to the function.

Parameters:
  • config_age (str) – The key from the config file to read the max age from
  • location_suffix (str) – The ending of the main part of the file name where the cached file is saved.
Returns:

A function that uses the caching configuration

Return type:

func

config.py

Small module to get information from the config.

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.config.DIRECTORYLIST = None

If this is set to a list of directories, it overrides the DirectoryList set in the configuration file. This prevents the tool from attempting to list directories that are not there.

dynamo_consistency.config.LOADER = <module 'json' from '/home/docs/.pyenv/versions/2.7.16/lib/python2.7/json/__init__.pyc'>

A module that uses the load function on a file descriptor to return a dictionary. (Examples are the json and yaml modules.) If your LOCATION is not a JSON file, you’ll want to change this also before calling config_dict().

dynamo_consistency.config.LOCATION = 'consistency_config.json'

The string giving the location of the configuration JSON file. Generally, you want to set this value of the module before calling config_dict() to get your configuration.

dynamo_consistency.config.SITE = None

A global place that stores a site that has been picked. Set in dynamo_consistency.picker.pick_site().

dynamo_consistency.config.config_dict()[source]

This only loads the configuration file the first time it is called

Returns:the configuration file in a dictionary
Return type:str
Raises:IOError – when it cannot find the configuration file
dynamo_consistency.config.vardir(directory)[source]

Gets the full path to a sub directory inside of VarLocation and creates an empty directory if needed.

Parameters:directory (str) – A desired sub-directory
Returns:Path to configured sub-directory
Return type:str

create.py

Module that contains all of the threading functions needed to create a dynamo_consistency.datatypes.DirectoryInfo object.

author:Daniel Abercrombie <dabercro@mit.edu>
class dynamo_consistency.create.ListingThread(number, recv, send, filler)[source]

A thread that does the listing

Parameters:
  • number (int) – A unique number of the thread created. This is used to generate a thread name and to fetch a logger.
  • recv (multiprocessing.Pipe) – One end of a pipe for communicating with the master thread. Receives message here.
  • send (multiprocessing.Queue) – A queue to send message to the master thread.
  • filler (function) –

    The function that takes a path as an argument and returns:

    • A bool saying whether the listing was successful or not
    • A list of tuples of sub-directories and their mod times
    • A list of tuples files inside, their size, and their mode times
out_queue = <multiprocessing.queues.Queue object>

A queue where all of the listing outputs are placed. A master thread is expected to read from and clear this.

static put_first_dir(location, directory)[source]

Place the first set of parameters for the ListingThread objects to start from. Should not be called once the threads are started.

Parameters:
  • location (str) – This is the beginning of the path where we will find first_dir. For example, to find the first directory mc, we also have to say where it is. For using CMS LFNs, location would be /store (where mc is inside).
  • directory (str) – Name of the first directory to run over inside of location
Returns:

The name that the first DirectoryInfo object should be. This is just the first directory in the directory parameter.

Return type:

str

run()[source]

Runs the listing thread

dynamo_consistency.create.create_dirinfo(location, first_dir, filler, object_params=None, callback=None)[source]

Create the directory information

Parameters:
  • location (str) – This is the beginning of the path where we will find first_dir. For example, to find the first directory mc, we also have to say where it is. For using CMS LFNs, location would be /store (where mc is inside). This is a path.
  • first_dir (str) – The name of the first directory that is inside the path of location. This should not be a path, but the name of the directory to list recursively.
  • filler (function or constructor) –

    This is either a function that lists the directory contents given just a path of os.path.join(location, first_dir), or it is a constructor that does the same thing with a member function called list. If filler is an object constructor, the parameters for the object creation must be passed through the parameter object_params. Both listings must return the following tuple:

    • A bool saying whether the listing was successful or not
    • A list of tuples of sub-directories and their mod times
    • A list of tuples files inside, their size, and their mode times
  • object_params (list) – This only needs to be set when filler is an object constructor. Each element in the list is a tuple of arguments to pass to the constructor.
  • callback (function) – A function that is called every time master thread has finished checking the child threads. This can happen very many times at large sites. The function is called with the main DirectoryTree as its argument
Returns:

A DirectoryInfo object containing everything the directory listings from os.path.join(location, first_dir) with name first_dir.

Return type:

DirectoryInfo

Raises:

messaging.Killed – When a site has been stopped

datatypes.py

Module defines the datatypes that are used for storage and comparison. There is also a powerful create_dirinfo function that takes a filler function or object and uses the multiprocessing module to recursively list directories in parallel.

author:Daniel Abercrombie <dabercro@mit.edu>
exception dynamo_consistency.datatypes.BadPath[source]

An exception for throwing when the path doesn’t make sense for various methods of a DirectoryInfo

class dynamo_consistency.datatypes.DirectoryInfo(name='', directories=None, files=None)[source]

Stores all of the information of the contents of a directory

Parameters:
  • name (str) – The name of the directory
  • directories (list) – If this is set, the infos in the list are merged into a master DirectoryInfo.
  • files (list) – List of tuples containing information about files in the directory.
add_file_list(file_infos)[source]

Add a list of tuples containing file_name, file_size to the node. This is most useful when you get a list of files from some other source and want to easily convert that list into a DirectoryInfo()

Parameters:file_infos (list) – The list of files (full path, size in bytes[, timestamp])
add_files(files)[source]

Set the files for this DirectoryInfo node

Parameters:files (list) – The tuples of file information. Each element consists of file name, size, and mod time.
Returns:self for chaining calls
Return type:DirectoryInfo
compare(other, path='', check=None)[source]

Does one way comparison with a different tree

Parameters:
  • other (DirectoryInfo) – The directory tree to compare this one to
  • path (str) – Is the path to get to this location so far
  • check (function) – An optional function that double checks a file name. If the checking function returns True for a file name, the file will not be included in the output.
Returns:

Tuple of list of files and directories that are present and not in the other tree and the size of the files that corresponds to

Return type:

list, list, long

count_nodes(empty=False)[source]
Parameters:empty (bool) – If True, only return the number of empty nodes
Returns:The total number of nodes in this Directory Info. This corresponds to approximately the number of listing requests required to build the data.
Return type:int
display(path='')[source]

Print out the contents of this DirectoryInfo

Parameters:path (str) – The full path to this DirectoryInfo instance
displays(path='')[source]

Get the string to print out the contents of this DirectoryInfo.

Parameters:path (str) – The full path to this DirectoryInfo instance
Returns:The display string
Return type:str
empty_nodes_list()[source]

This function should be used to get the nodes to delete in the proper order for non-recursive deletion

Returns:The list of empty directories to delete in the order to delete
Return type:list
empty_nodes_set()[source]

This function recursively builds the entire list of empty directories that can be deleted

Returns:The set of empty directories to delete
Return type:set
get_directory_size()[source]

Report the total size used by this directory and its subdirectories.

Returns:Size of files in directory, in bytes
Return type:int
get_file(file_name)[source]

Get the file dictionary based off the name.

Parameters:file_name (str) – The LFN of the file
Returns:Dictionary of file information
Return type:dict
Raises:BadPath – if the file_name does not start with self.name
get_files(min_age=0, path='')[source]

Get the list of files that are older than some age

Parameters:
  • min_age (int) – The minimum age, in seconds, of files to list
  • path (str) – The path to this file. Used for recursive calls
Returns:

List of full file paths

Return type:

list

get_node(path, make_new=True)[source]

Get the node that corresponds to the path given. If the node does not exist yet, and make_new is True, the node is created.

Parameters:
  • path (str) – Path to the desired node from current node. If the path does not exist yet, empty nodes will be created.
  • make_new (str) – Bool to create new node if none exists at path or not
Returns:

A node with the proper path, unless make_new is False and the node doesn’t exist

Return type:

DirectoryInfo or None

get_num_files(unlisted=False, place_new=False)[source]

Report the total number of files stored.

Parameters:
  • unlisted (bool) – If true, return number of unlisted directories, Otherwise return only successfully listed files
  • place_new (bool) – If true, pretend there’s one more file inside any new directory or if files is None. This prevents listing of empty directories to include directories that should not actually be deleted.
Returns:

The number of files in the directory tree structure

Return type:

int

get_unlisted(path='')[source]
Parameters:path (str) – Path to prepend to the name, used in recursive calls
Returns:List of directories that were unlisted
Return type:list
listdir(*args, **kwargs)[source]

Get the list of directory names within a DirectoryInfo. Adding an argument will display the contents of the next directory. For example, if dir.listdir() returns:

0: data
1: mc

dir.listdir(1) then lists the contents of mc and dir.listdir(1, 0) lists the contents of the first subdirectory in mc.

Parameters:
  • args – Is a list of indices to list the subdirectories
  • kwargs – Supports ‘printing’ which is set to a bool. Defaults as True.
Returns:

The DirectoryInfo that is being listed

Return type:

DirectoryInfo

remove_node(path_name)[source]

Remove an empty node from the DirectoryInfo

Parameters:

path_name (str) – The path to the node, including the self.name at the beginning

Returns:

self for chaining

Return type:

DirectoryInfo

Raises:
  • NotEmpty – if the directory is not empty or self.files is None
  • BadPath – if the path_name does not start with the self.name
save(file_name)[source]

Save this DirectoryInfo in a file.

Parameters:file_name (str) – is the location to save the file
setup_hash()[source]

Set the hashes for this DirectoryInfo

exception dynamo_consistency.datatypes.NotEmpty[source]

An exception for throwing when a non-empty directory is deleted from a DirectoryInfo

dynamo_consistency.datatypes.compare(inventory, listing, output_base=None, orphan_check=None, missing_check=None)[source]

Compare two different trees and output the differences into an ASCII file

Parameters:
  • inventory (DirectoryInfo) – The tree of files that should be at a site
  • listing (DirectoryInfo) – The tree of files that are listed remotely
  • output_base (str) – The names of the ASCII files to place the reports are generated from this variable.
  • orphan_check (function) – A function that double checks each expected orphan. The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as an orphan.
  • missing_check (function) – A function checks each expected missing file The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as missing.
Returns:

The two lists, missing and orphan files

Return type:

tuple

dynamo_consistency.datatypes.get_info(file_name)[source]

Get the DirectoryInfo from a file.

Parameters:file_name (str) – is the location of the saved information
Returns:Saved info
Return type:DirectoryInfo

emptyremover.py

Defines a class that can remove nodes from a DirectoryInfo object

class dynamo_consistency.emptyremover.EmptyRemover(site, check=None)[source]

This class handles the removal of empty directories from the tree by behaving as a callback. It also calls deletions for the registry at the same time.

Parameters:
  • site (str) – Site name. If value is None, then don’t enter deletions into the registry, but still remove node from tree
  • check (function) – The function to check against orphans to not delete. The full path name is passed to the function. If it returns True, the directory is not deleted.
fullname(name)[source]

Get the full LFN of a path. Checkes if the root is included, and adds it if necessary.

Note

Won’t work if the root name is the same as a relative path inside it.

Parameters:name (str) – May be a relative path to the root
Returns:Full LFN
Return type:str
get_removed_count()[source]
Returns:The number of directories removed by this function object
Return type:int

filters.py

Defines tools for filtering file names out of the compare result

class dynamo_consistency.filters.Filters(*args)[source]

Holds multiple functions for filtering out file names from the dynamo_consistency.datatypes.compare() function.

Parameters:args – An optional list of functions to build the filter
add(func)[source]

The function must take a single argument, which is passed by the __call__ operator of this class. Exceptions should be handled by the function.

Parameters:func (function) – A function to add to this filter
protected(file_name)[source]

Checks if any of the filtering functions passed here return True. Exceptions are not handled here.

Parameters:file_name (str) – A file to check against all other filters
Returns:Result of OR of all stored functions
Return type:bool
class dynamo_consistency.filters.FullFilter[source]

Always returns true

protected(_)[source]

Takes the file name as a dummy variable, but doesn’t check it

class dynamo_consistency.filters.PatternFilter(patterns)[source]

This tells if the named file contains one of the ignored patterns. These are just checked to see that the file name contains one of the listed strings. There’s no regex in here.

Parameters:patterns (list) – List of “patterns” to check.
protected(file_name)[source]
Parameters:file_name (str) – Name of the file to check for patterns in
Returns:True if one of the stored patterns is in the file_name
Return type:bool

history.py

Handles the invalidation of files through a separate read-write process

class dynamo_consistency.history.LockedConn[source]

Similar to dynamo_consistency.summary.LockedConn We want to handle the history database here though

close()[source]

Commit and close the connection

dynamo_consistency.history.empty_directories(site, acting=False)[source]

Get the list of empty directories. If acting on them, the directories are moved into the history database.

Parameters:
  • site (str) – Name of a site to get empty directories for
  • acting (bool) – Whether or not the caller is acting on the list
Returns:

The directory list

Return type:

list

dynamo_consistency.history.finish_run()[source]

Called in dynamo_consistency.main.main() to register the end of a consistency run

dynamo_consistency.history.missing_files(site, acting=False)[source]

Get the missing files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.

Parameters:
  • site (str) – Name of a site to get missing files for
  • acting (bool) – Whether or not the caller is acting on the files
Returns:

The LFNs that were missing

Return type:

list

dynamo_consistency.history.orphan_files(site, acting=False)[source]

Get the orphan files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.

Parameters:
  • site (str) – Name of a site to get orphan files for
  • acting (bool) – Whether or not the caller is acting on the files
Returns:

The LFNs that were orphan

Return type:

list

dynamo_consistency.history.report_empty(directories)[source]

Adds emtpy directories to history database

Parameters:directories (list) – A list of directory names and mtime (in seconds)
dynamo_consistency.history.report_missing(missing)[source]

Stores a list of missing files in the invalidation table

Parameters:missing (list) – A list of tuples, where each tuple is a name, info dict pair
dynamo_consistency.history.report_orphan(orphan)[source]

Stores a list of orphan files in the orphan table

Parameters:orphan (list) – A list of tuples, where each tuple is a name, info dict pair
dynamo_consistency.history.report_unmerged(unmerged)[source]

Stores a list of deletable unmerged files in the orphan table

Parameters:unmerged (list) – A list of tuples, where each tuple is a name, info dict pair
dynamo_consistency.history.start_run()[source]

Called in dynamo_consistency.main.main() to register the start of a consistency run

dynamo_consistency.history.unmerged_files(site, acting=False)[source]

Get the deletable unmerged files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.

Parameters:
  • site (str) – Name of a site to get unmerged files for
  • acting (bool) – Whether or not the caller is acting on the files
Returns:

The LFNs in unmerged that are deletable

Return type:

list

inventorylister.py

This module gets the information from the inventory about a site’s contents

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.inventorylister.filter_files(site, pathstrip)[source]

Gets the files from the inventory and filters them through the configuration’s DirectoryList

Parameters:
  • site (str) – The site to get the files from
  • pathstrip (int) – The length of the root node’s name that is stripped from the directory name for filtering
Returns:

Tuples for adding to dynamo_consistency.datatypes.DirectoryInfo.add_file_list()

Return type:

generator

dynamo_consistency.inventorylister.listing(site, callback=None, **kwargs)[source]

Get the list of files from the inventory.

Parameters:site (str) – The name of the site to load
Returns:The file replicas that are supposed to be at a site
Return type:dynamo_consistency.datatypes.DirectoryInfo

logsetup.py

The module that sets up logging for us

dynamo_consistency.logsetup.change_logfile(*filenames)[source]

Changes the output file of all of the loggers. Creates any directories that are needed to hold the logs.

Parameters:filenames – The files to write new logs to
dynamo_consistency.logsetup.match_logs(source, targets)[source]
Parameters:
  • source (logging.Logger) – Logger that has handlers to use
  • targets (list) – List of loggers that need handlers updated

main.py

Holds the main function for running the consistency check

dynamo_consistency.main.compare_with_inventory(site)[source]

Gets the listing from the dynamo database, and remote XRootD listings of a given site. The differences are compared to deletion queues and other things.

Parameters:site (str) – The site to run the check over
Returns:Start time of the running and a dictionary of parameters to report to the summary webpage. See summary.update_summary() parameters for returned keys.
Return type:float, dict
dynamo_consistency.main.extras(site)[source]

Runs a bunch of functions after the main consistency check, depending on the presence of certain arguments and configuration

Parameters:site (str) – For use to pass to extras
Returns:Dictionary with interesting results. Keys include the following:
  • "unmerged" - A tuple listing unmerged files removed and unmerged logs
Return type:dict
dynamo_consistency.main.main(site)[source]

Runs comparison, and extras based on command line. Updates the summary table for normal runs.

Parameters:site (str) – Site to run over
dynamo_consistency.main.make_filters(site)[source]

Creates filters proper for running environment and options

Parameters:site (str) – Site to get activity at
Returns:Three filters.Filter objects that can be used to check orphans, missing files, and ignored directories respectively
Return type:filters.Filter, filters.Filter, filters.PatternFilter
dynamo_consistency.main.report_files(inv, remote, missing, orphans, prev_set=None)[source]

Reports files to the history database. If prev_set is given, only missing files that also appear in this set will be invalidated.

Parameters:

messaging.py

A module for handling messages

class dynamo_consistency.messaging.Checker(site=None, timeout=15, locking=True)[source]

Checks the summary every few seconds if it should still be running

Parameters:
  • site (str) – Site to check. If none, read from config.SITE
  • timeout (int) – Number of seconds between checks to summary table
  • locking (bool) – True to get a lock before reading the database. This is mostly to avoid agressive reading.
isrunning()[source]
Returns:If the site given is supposed to be running
Return type:bool
exception dynamo_consistency.messaging.Killed[source]

An exception to throw when no longer running a site

parser.py

Module that parses the command line for dynamo-consistency

dynamo_consistency.parser.get_parser(modname='__main__', prog='sphinx-build')[source]
Parameters:
  • modname (str) – The module to fetch the __doc__ optionally __usage__ from. If you want the parser for a particular file, this would usually be __name__
  • prog (str) – The name for the program that we want the parser for
Returns:

A parser based on the program name and the arguments to pass it

Return type:

optparse.OptionParser, list

dynamo_consistency.parser.pretty_exe(name)[source]

Modifies the calling module’s doc string

Parameters:name (str) – The desired heading for the new docstring

picker.py

The bit of the summary table that also relies on accurate backend.

dynamo_consistency.picker.pick_site(pattern=None, lockname=None)[source]

This function also does the task of syncronizing the summary database with the inventory’s list of sites that match the pattern.

Parameters:
  • pattern (str) – A regex that needs to be contained in the site name
  • lockname (str) – Name of the lock file that the site should use. Needs to be ‘’ for guaranteed no lock.
Returns:

The name of a site that is ready and hasn’t run in the longest time

Return type:

str

Raises:

NoMatchingSite – If no site matches or is ready

remotelister.py

Tool to get the files located at a site.

author:

Daniel Abercrombie <dabercro@mit.edu>

Max Goncharov <maxi@mit.edu>

dynamo_consistency.remotelister.listing(site, callback=None, **kwargs)[source]

Get the information for a site, from XRootD or a cache.

Parameters:
  • site (str) – The site name
  • callback (function) – The callback function to pass to create.create_dirinfo()
Returns:

The site directory listing information

Return type:

dynamo_consistency.datatypes.DirectoryInfo

signaling.py

A small module for handling signals

dynamo_consistency.signaling.halt(signum, _)[source]

Halts the current listing using the summary tables

summary.py

Module that handles the summary database and webpage. It will install the summary webpage for you the first time you run the consistency check.

exception dynamo_consistency.summary.BadAction[source]

For raising one of the following actions wasn’t identifed properly

class dynamo_consistency.summary.LockedConn[source]

Holds a connection to the summary database. Includes fh locking itself so that we don’t crash over that

close()[source]

Proxy to close and remove file lock

exception dynamo_consistency.summary.NoMatchingSite[source]

For raising when consistency doesn’t know what site to run on

dynamo_consistency.summary.do_update()[source]

Determines if running under conditions where the summary table should be updated

Returns:True if the update should happen
Return type:bool
dynamo_consistency.summary.get_dst()[source]
Returns:1 for daylight savings time, 0 otherwise, -1 if unsure
Return type:int
dynamo_consistency.summary.get_sites(reporting=False)[source]
Parameters:reporting (bool) – If true, only get sites that should be reported to dynamo
Returns:The list of sites that are currently in the database
Return type:list
dynamo_consistency.summary.get_status(site)[source]
Returns:Running status of a site
Return type:int
Raises:NoMatchingSite – If no matching site is in the database
dynamo_consistency.summary.install_webpage()[source]

Installs files for webpage in configured Web_Dir

dynamo_consistency.summary.is_debugged(site)[source]
Returns:If the site is cleared for acting on consistency results
Return type:bool
dynamo_consistency.summary.move_local_files(site)[source]

Move files in the working directory to the web page

Parameters:site (str) – The site which has files ready to move
dynamo_consistency.summary.running(site)[source]

Show the site as running on the web page and note the start time

Parameters:site (str) – Site to run
dynamo_consistency.summary.set_reporting(site, status)[source]

Sets the reporint status of a site.

Parameters:
  • site (str) – Site name
  • status (int) – Status flag
Raises:

BadAction – If the status doesn’t make sense

dynamo_consistency.summary.set_status(site, status)[source]

Sets the run status of a site.

Parameters:
  • site (str) – Site name
  • status (int) – Status flag
Raises:

BadAction – If the status doesn’t make sense

dynamo_consistency.summary.unlock_site(site)[source]

Sets the site running status back to 0 if running

Parameters:site (str) – Site to unlock
dynamo_consistency.summary.update_config()[source]

Updates the configuration file at the summary website

dynamo_consistency.summary.update_summary(site, duration, numfiles, numnodes, numempty, nummissing, missingsize, numorphan, orphansize, numnosource, numunrecoverable, numunlisted, numbadunlisted, numunmerged=0, numlogs=0)[source]

Update the summary webpage.

Parameters:
  • site (str) – The site to update the summary for
  • duration (float) – The amount of time it took to run, in seconds
  • numfiles (int) – Number of files in the tree
  • numnodes (int) – Number of directories listed
  • numempty (int) – Number of empty directories to delete
  • nummissing (int) – Number of missing files
  • missingsize (int) – Size of missing files, in bytes
  • numorphan (int) – Number of orphan files
  • orphansize (int) – Size of orphan files, in bytes
  • numnosource (int) – The number of missing files that are on no other disk
  • numunrecoverable (int) – The number of missing files that are not on disk or tape
  • numunlisted (int) – Number of directories that were not listed
  • numbadunlisted (int) – Number of unlisted directories that were not listed due to error
  • numunmerged (int) – Number of files to remove from unmerged (CMS only)
  • numlogs (int) – Number of unmerged files that were logs (CMS only)
Returns:

True if the summary table was updated

Return type:

bool

dynamo_consistency.summary.webdir()[source]

If the web directory does not exist, this function installs it

Returns:The web directory location
Return type:str