Reference

The following is a full reference to the submodules inside of the dynamo_consistency module.

config.py

Small module to get information from the config.

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.config.DIRECTORYLIST = None

If this is set to a list of directories, it overrides the DirectoryList set in the configuration file. This prevents the tool from attempting to list directories that are not there.

dynamo_consistency.config.LOADER = <module 'json' from '/usr/lib/python2.7/json/__init__.pyc'>

A module that uses the load function on a file descriptor to return a dictionary. (Examples are the json and yaml modules.) If your LOCATION is not a JSON file, you’ll want to change this also before calling config_dict().

dynamo_consistency.config.LOCATION = 'consistency_config.json'

The string giving the location of the configuration JSON file. Generally, you want to set this value of the module before calling config_dict() to get your configuration.

dynamo_consistency.config.config_dict()[source]

This only loads the configuration file the first time it is called

Returns:the configuration file in a dictionary
Return type:str
Raises:IOError – when it cannot find the configuration file

datatypes.py

Module defines the datatypes that are used for storage and comparison. There is also a powerful create_dirinfo function that takes a filler function or object and uses the multiprocessing module to recursively list directories in parallel.

author:Daniel Abercrombie <dabercro@mit.edu>
exception dynamo_consistency.datatypes.BadPath[source]

An exception for throwing when the path doesn’t make sense for various methods of a DirectoryInfo

class dynamo_consistency.datatypes.DirectoryInfo(name='', directories=None, files=None)[source]

Stores all of the information of the contents of a directory

Parameters:
  • name (str) – The name of the directory
  • directories (list) – If this is set, the infos in the list are merged into a master DirectoryInfo.
  • files (list) – List of tuples containing information about files in the directory.
add_file_list(file_infos)[source]

Add a list of tuples containing file_name, file_size to the node. This is most useful when you get a list of files from some other source and want to easily convert that list into a DirectoryInfo()

Parameters:file_infos (list) – The list of files (full path, size in bytes[, timestamp])
add_files(files)[source]

Set the files for this DirectoryInfo node

Parameters:files (list) – The tuples of file information. Each element consists of file name, size, and mod time.
Returns:self for chaining calls
Return type:DirectoryInfo
compare(other, path='', check=None)[source]

Does one way comparison with a different tree

Parameters:
  • other (DirectoryInfo) – The directory tree to compare this one to
  • path (str) – Is the path to get to this location so far
  • check (function) – An optional function that double checks a file name. If the checking function returns True for a file name, the file will not be included in the output.
Returns:

Tuple of list of files and directories that are present and not in the other tree and the size of the files that corresponds to

Return type:

list, list, long

count_nodes(empty=False)[source]
Parameters:empty (bool) – If True, only return the number of empty nodes
Returns:The total number of nodes in this Directory Info. This corresponds to approximately the number of listing requests required to build the data.
Return type:int
display(path='')[source]

Print out the contents of this DirectoryInfo

Parameters:path (str) – The full path to this DirectoryInfo instance
displays(path='')[source]

Get the string to print out the contents of this DirectoryInfo.

Parameters:path (str) – The full path to this DirectoryInfo instance
Returns:The display string
Return type:str
empty_nodes_list()[source]

This function should be used to get the nodes to delete in the proper order for non-recursive deletion

Returns:The list of empty directories to delete in the order to delete
Return type:list
empty_nodes_set()[source]

This function recursively builds the entire list of empty directories that can be deleted

Returns:The set of empty directories to delete
Return type:set
get_directory_size()[source]

Report the total size used by this directory and its subdirectories.

Returns:Size of files in directory, in bytes
Return type:int
get_file(file_name)[source]

Get the file dictionary based off the name.

Parameters:file_name (str) – The LFN of the file
Returns:Dictionary of file information
Return type:dict
Raises:BadPath – if the file_name does not start with self.name
get_files(min_age=0, path='')[source]

Get the list of files that are older than some age

Parameters:
  • min_age (int) – The minimum age, in seconds, of files to list
  • path (str) – The path to this file. Used for recursive calls
Returns:

List of full file paths

Return type:

list

get_node(path, make_new=True)[source]

Get the node that corresponds to the path given. If the node does not exist yet, and make_new is True, the node is created.

Parameters:
  • path (str) – Path to the desired node from current node. If the path does not exist yet, empty nodes will be created.
  • make_new (str) – Bool to create new node if none exists at path or not
Returns:

A node with the proper path, unless make_new is False and the node doesn’t exist

Return type:

DirectoryInfo or None

get_num_files(unlisted=False, place_new=False)[source]

Report the total number of files stored.

Parameters:
  • unlisted (bool) – If true, return number of unlisted directories, Otherwise return only successfully listed files
  • place_new (bool) – If true, pretend there’s one more file inside any new directory or if files is None. This prevents listing of empty directories to include directories that should not actually be deleted.
Returns:

The number of files in the directory tree structure

Return type:

int

get_unlisted(path='')[source]
Parameters:path (str) – Path to prepend to the name, used in recursive calls
Returns:List of directories that were unlisted
Return type:list
listdir(*args, **kwargs)[source]

Get the list of directory names within a DirectoryInfo. Adding an argument will display the contents of the next directory. For example, if dir.listdir() returns:

0: data
1: mc

dir.listdir(1) then lists the contents of mc and dir.listdir(1, 0) lists the contents of the first subdirectory in mc.

Parameters:
  • args – Is a list of indices to list the subdirectories
  • kwargs – Supports ‘printing’ which is set to a bool. Defaults as True.
Returns:

The DirectoryInfo that is being listed

Return type:

DirectoryInfo

remove_node(path_name)[source]

Remove an empty node from the DirectoryInfo

Parameters:

path_name (str) – The path to the node, including the self.name at the beginning

Returns:

self for chaining

Return type:

DirectoryInfo

Raises:
  • NotEmpty – if the directory is not empty or self.files is None
  • BadPath – if the path_name does not start with the self.name
save(file_name)[source]

Save this DirectoryInfo in a file.

Parameters:file_name (str) – is the location to save the file
setup_hash()[source]

Set the hashes for this DirectoryInfo

dynamo_consistency.datatypes.LOG = <logging.Logger object>

The maximum age, in days, of files and directories to ignore in this check. This variable should be reset once in a while by deamons that run while an operator might be adjusting the configuration.

exception dynamo_consistency.datatypes.NotEmpty[source]

An exception for throwing when a non-empty directory is deleted from a DirectoryInfo

dynamo_consistency.datatypes.compare(inventory, listing, output_base=None, orphan_check=None, missing_check=None)[source]

Compare two different trees and output the differences into an ASCII file

Parameters:
  • inventory (DirectoryInfo) – The tree of files that should be at a site
  • listing (DirectoryInfo) – The tree of files that are listed remotely
  • output_base (str) – The names of the ASCII files to place the reports are generated from this variable.
  • orphan_check (function) – A function that double checks each expected orphan. The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as an orphan.
  • missing_check (function) – A function checks each expected missing file The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as missing.
Returns:

The two lists, missing and orphan files

Return type:

tuple

dynamo_consistency.datatypes.create_dirinfo(location, first_dir, filler, object_params=None, callback=None)[source]

Create the directory information

Parameters:
  • location (str) – This is the beginning of the path where we will find first_dir. For example, to find the first directory mc, we also have to say where it is. For using CMS LFNs, location would be /store (where mc is inside). This is a path.
  • first_dir (str) – The name of the first directory that is inside the path of location. This should not be a path, but the name of the directory to list recursively.
  • filler (function or constructor) –

    This is either a function that lists the directory contents given just a path of os.path.join(location, first_dir), or it is a constructor that does the same thing with a member function called list. If filler is an object constructor, the parameters for the object creation must be passed through the parameter object_params. Both listings must return the following tuple:

    • A bool saying whether the listing was successful or not
    • A list of tuples of sub-directories and their mod times
    • A list of tuples files inside, their size, and their mode times
  • object_params (list) – This only needs to be set when filler is an object constructor. Each element in the list is a tuple of arguments to pass to the constructor.
  • callback (function) – A function that is called every time master thread has finished checking the child threads. This can happen very many times at large sites. The function is called with the main DirectoryTree as its argument
Returns:

A DirectoryInfo object containing everything the directory listings from os.path.join(location, first_dir) with name first_dir.

Return type:

DirectoryInfo

dynamo_consistency.datatypes.get_info(file_name)[source]

Get the DirectoryInfo from a file.

Parameters:file_name (str) – is the location of the saved information
Returns:Saved info
Return type:DirectoryInfo

remotelister.py

Tool to get the files located at a site.

author:

Daniel Abercrombie <dabercro@mit.edu>

Max Goncharov <maxi@mit.edu>

dynamo_consistency.remotelister.listing(site, callback=None, **kwargs)[source]

Get the information for a site, from XRootD or a cache.

Parameters:
  • site (str) – The site name
  • callback (function) – The callback function to pass to datatypes.create_dirinfo()
Returns:

The site directory listing information

Return type:

dynamo_consistency.datatypes.DirectoryInfo

inventorylister.py

This module gets the information from the inventory about a site’s contents

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.inventorylister.filter_files(site, pathstrip)[source]

Gets the files from the inventory and filters them through the configuration’s DirectoryList

Parameters:
  • site (str) – The site to get the files from
  • pathstrip (int) – The length of the root node’s name that is stripped from the directory name for filtering
Returns:

Tuples for adding to dynamo_consistency.datatypes.DirectoryInfo.add_file_list()

Return type:

generator

dynamo_consistency.inventorylister.listing(site, callback=None, **kwargs)[source]

Get the list of files from the inventory.

Parameters:site (str) – The name of the site to load
Returns:The file replicas that are supposed to be at a site
Return type:dynamo_consistency.datatypes.DirectoryInfo