Reference¶
The following is a full reference to the submodules inside of the dynamo_consistency module.
checkphedex.py¶
A module that provides functions to check the comparison results to the list of files and deletions in PhEDEx.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
dynamo_consistency.checkphedex.check_for_datasets(site, orphan_list_file)[source]¶ Checks PhEDEx exhaustively to see if a dataset should exist at a site, according to PhEDEx, but has files marked as orphans according to our check. This is done via the PhEDEx
filereplicasAPI. The number of filereplicas for each dataset is printed to the terminal. Datasets that contain any filereplicas are returned by this function.Parameters: Returns: The list of number of files and datasets for each dataset that is supposed to have at least 1 file at the site.
Return type: list of tuples
-
dynamo_consistency.checkphedex.get_phedex_tree(site, callback=None, **kwargs)[source]¶ Get the file list tree from PhEDEx. Uses the InventoryAge configuration to determine when to refresh cache.
Parameters: site (str) – The site to get information from PhEDEx for. Returns: A tree containing file replicas that are supposed to be at the site Return type: dynamo_consistency.datatypes.DirectoryInfo
-
dynamo_consistency.checkphedex.set_of_deletions(site)[source]¶ Get a list of datasets with approved deletion requests at a given site that were created within the number of days matching the IgnoreAge configuration parameter. This request is done via the PhEDEx
deleterequestsAPI.Parameters: site (str) – The site that we want the list of deletion requests for. Returns: Datasets that are in deletion requests Return type: set
config.py¶
Small module to get information from the config.
Warning
Must be used on a machine with xrdfs installed (for locate command).
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
dynamo_consistency.config.CONFIG_FILE= 'consistency_config.json'¶ The string giving the location of the configuration JSON file. Generally, you want to set this value of the module before calling
config_dict()to get your configuration.
-
dynamo_consistency.config.DIRECTORYLIST= None¶ If this is set to a list of directories, it overrides the
DirectoryListset in the configuration file. This prevents the tool from attempting to list directories that are not there.
-
dynamo_consistency.config.LOADER= <module 'json' from '/usr/lib/python2.7/json/__init__.pyc'>¶ A module that uses the load function on a file descriptor to return a dictionary. (Examples are the
jsonandyamlmodules.) If yourCONFIG_FILEis not a JSON file, you’ll want to change this also before callingconfig_dict().
-
dynamo_consistency.config.config_dict(make_dir=True)[source]¶ Parameters: make_dir (bool) – Create the cache directory if it’s missing
Returns: the configuration file in a dictionary
Return type: Raises:
-
dynamo_consistency.config.get_redirector(site, banned_doors=None)[source]¶ Get the redirector and xrootd door servers for a given site. An example valid site name is
T2_US_MIT.Parameters: Returns: Public hostname of the local redirector and a list of xrootd door servers
Return type:
datatypes.py¶
Module defines the datatypes that are used for storage and comparison. There is also a powerful create_dirinfo function that takes a filler function or object and uses the multiprocessing module to recursively list directories in parallel.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
exception
dynamo_consistency.datatypes.BadPath[source]¶ An exception for throwing when the path doesn’t make sense for various methods of a
DirectoryInfo
-
class
dynamo_consistency.datatypes.DirectoryInfo(name='', directories=None, files=None)[source]¶ Stores all of the information of the contents of a directory
Parameters: - name (str) – The name of the directory
- directories (list) – If this is set, the infos in the
list are merged into a master
DirectoryInfo. - files (list) – List of tuples containing information about files in the directory.
-
add_file_list(file_infos)[source]¶ Add a list of tuples containing file_name, file_size to the node. This is most useful when you get a list of files from some other source and want to easily convert that list into a
DirectoryInfo()Parameters: file_infos (list) – The list of files (full path, size in bytes[, timestamp])
-
add_files(files)[source]¶ Set the files for this
DirectoryInfonodeParameters: files (list) – The tuples of file information. Each element consists of file name, size, and mod time. Returns: self for chaining calls Return type: DirectoryInfo
-
compare(other, path='', check=None)[source]¶ Does one way comparison with a different tree
Parameters: - other (DirectoryInfo) – The directory tree to compare this one to
- path (str) – Is the path to get to this location so far
- check (function) – An optional function that double checks a file name.
If the checking function returns
Truefor a file name, the file will not be included in the output.
Returns: Tuple of list of files and directories that are present and not in the other tree and the size of the files that corresponds to
Return type:
-
count_nodes(empty=False)[source]¶ Parameters: empty (bool) – If True, only return the number of empty nodes Returns: The total number of nodes in this Directory Info. This corresponds to approximately the number of listing requests required to build the data. Return type: int
-
display(path='')[source]¶ Print out the contents of this
DirectoryInfoParameters: path (str) – The full path to this DirectoryInfoinstance
-
displays(path='')[source]¶ Get the string to print out the contents of this
DirectoryInfo.Parameters: path (str) – The full path to this DirectoryInfoinstanceReturns: The display string Return type: str
-
empty_nodes_list()[source]¶ This function should be used to get the nodes to delete in the proper order for non-recursive deletion
Returns: The list of empty directories to delete in the order to delete Return type: list
-
empty_nodes_set()[source]¶ This function recursively builds the entire list of empty directories that can be deleted
Returns: The set of empty directories to delete Return type: set
-
get_directory_size()[source]¶ Report the total size used by this directory and its subdirectories.
Returns: Size of files in directory, in bytes Return type: int
-
get_file(file_name)[source]¶ Get the file dictionary based off the name.
Parameters: file_name (str) – The LFN of the file Returns: Dictionary of file information Return type: dict Raises: BadPath – if the file_name does not start with self.name
-
get_files(min_age=0, path='')[source]¶ Get the list of files that are older than some age
Parameters: Returns: List of full file paths
Return type:
-
get_node(path, make_new=True)[source]¶ Get the node that corresponds to the path given. If the node does not exist yet, and
make_newis True, the node is created.Parameters: Returns: A node with the proper path, unless make_new is False and the node doesn’t exist
Return type:
-
get_num_files(unlisted=False, place_new=False)[source]¶ Report the total number of files stored.
Parameters: - unlisted (bool) – If true, return number of unlisted directories, Otherwise return only successfully listed files
- place_new (bool) – If true, pretend there’s one more file inside any new directory or if files is None. This prevents listing of empty directories to include directories that should not actually be deleted.
Returns: The number of files in the directory tree structure
Return type:
-
get_unlisted(path='')[source]¶ Parameters: path (str) – Path to prepend to the name, used in recursive calls Returns: List of directories that were unlisted Return type: list
-
listdir(*args, **kwargs)[source]¶ Get the list of directory names within a
DirectoryInfo. Adding an argument will display the contents of the next directory. For example, ifdir.listdir()returns:0: data 1: mc
dir.listdir(1)then lists the contents ofmcanddir.listdir(1, 0)lists the contents of the first subdirectory inmc.Parameters: - args – Is a list of indices to list the subdirectories
- kwargs – Supports ‘printing’ which is set to a bool. Defaults as True.
Returns: The
DirectoryInfothat is being listedReturn type:
-
remove_node(path_name)[source]¶ Remove an empty node from the DirectoryInfo
Parameters: path_name (str) – The path to the node, including the
self.nameat the beginningReturns: self for chaining
Return type: Raises:
-
save(file_name)[source]¶ Save this
DirectoryInfoin a file.Parameters: file_name (str) – is the location to save the file
-
setup_hash()[source]¶ Set the hashes for this
DirectoryInfo
-
dynamo_consistency.datatypes.IGNORE_AGE= 1.0¶ The maximum age, in days, of files and directories to ignore in this check. This variable should be reset once in a while by deamons that run while an operator might be adjusting the configuration.
-
exception
dynamo_consistency.datatypes.NotEmpty[source]¶ An exception for throwing when a non-empty directory is deleted from a
DirectoryInfo
-
dynamo_consistency.datatypes.compare(inventory, listing, output_base=None, orphan_check=None, missing_check=None)[source]¶ Compare two different trees and output the differences into an ASCII file
Parameters: - inventory (DirectoryInfo) – The tree of files that should be at a site
- listing (DirectoryInfo) – The tree of files that are listed remotely
- output_base (str) – The names of the ASCII files to place the reports are generated from this variable.
- orphan_check (function) – A function that double checks each expected orphan. The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as an orphan.
- missing_check (function) – A function checks each expected missing file The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as missing.
Returns: The two lists, missing and orphan files
Return type:
-
dynamo_consistency.datatypes.create_dirinfo(location, first_dir, filler, object_params=None, callback=None)[source]¶ Create the directory information
Parameters: - location (str) – This is the beginning of the path where we will find
first_dir. For example, to find the first directorymc, we also have to say where it is. In most cases, using LFNs, location would be/store/(wheremcis inside). This is a path. - first_dir (str) – The name of the first directory that is inside the path of
location. This should not be a path, but the name of the directory to list recursively. - filler (function or constructor) –
This is either a function that lists the directory contents given just a path of
os.path.join(location, first_dir), or it is a constructor that does the same thing with a member function calledlist. Iffilleris an object constructor, the parameters for the object creation must be passed through the parameterobject_params. Both listings must return the following tuple:- A bool saying whether the listing was successful or not
- A list of tuples of sub-directories and their mod times
- A list of tuples files inside, their size, and their mode times
- object_params (list) – This only needs to be set when filler is an object constructor. Each element in the list is a tuple of arguments to pass to the constructor.
- callback (function) – A function that is called every time master thread has finished checking the child threads. This can happen very many times at large sites. The function is called with the main DirectoryTree as its argument
Returns: A
DirectoryInfoobject containing everything the directory listings fromos.path.join(location, first_dir)with namefirst_dir.Return type: - location (str) – This is the beginning of the path where we will find
-
dynamo_consistency.datatypes.get_info(file_name)[source]¶ Get the
DirectoryInfofrom a file.Parameters: file_name (str) – is the location of the saved information Returns: Saved info Return type: DirectoryInfo
getsitecontents.py¶
Tool to get the files located at a site.
Warning
Must be used on a machine with XRootD python module installed.
| author: | Daniel Abercrombie <dabercro@mit.edu> Max Goncharov <maxi@mit.edu> |
|---|
-
class
dynamo_consistency.getsitecontents.GFalLister(site, thread_num=None)[source]¶ An object to list a site through
gfal-lscalls
-
class
dynamo_consistency.getsitecontents.Lister(thread_num, site)[source]¶ The protoype of the listing facility
Parameters: -
list(path, retries=0)[source]¶ Return the directory contents at the given path. The
listmember is expected of every object passed todatatypes.Parameters: Returns: A bool indicating the success, a list of directories, and a list of files. The list of directories consists of tuples of (directory name, mod time). The list of files consistents of tuples of (file name, size, mod time). The modification times are in seconds from epoch and the file size is in bytes.
Return type:
-
-
class
dynamo_consistency.getsitecontents.XRootDLister(site, door, thread_num=None)[source]¶ A class that holds two XRootD connections. If the primary connection fails to list a directory, then a fallback connection is used. This keeps the load of listing from hitting more than half of a site’s doors at a time.
Parameters: -
ls_directory(**kwargs)[source]¶ Gets the contents of the previously defined redirector at a given path
Parameters: path (str) – The full path, starting with /store/, of the directory to list.Returns: A bool indicating the success, a list of directories, and a list of files. Return type: bool, list, list
-
-
class
dynamo_consistency.getsitecontents.XRootDSubShell(site, door, thread_num=None)[source]¶ Very similar to the
XRootDLister, but uses a subshell through pexpect.
-
dynamo_consistency.getsitecontents.ct_timestamp(line)[source]¶ Takes a time string from gfal and extracts the time since epoch
Parameters: line (str) – The line from the gfal-ls call including month, day, and year in some format with lots of hypens Returns: Timestamp’s time since epoch Return type: int
-
dynamo_consistency.getsitecontents.get_site_tree(site, callback=None, **kwargs)[source]¶ Get the information for a site, from XRootD or a cache.
Parameters: - site (str) – The site name
- callback (function) – The callback function to pass to
datatypes.create_dirinfo()
Returns: The site directory listing information
Return type:
getinventorycontents.py¶
This module gets the information from the inventory about a site’s contents
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
class
dynamo_consistency.getinventorycontents.InvLoader[source]¶ Creates an InventoryManager object, if needed, and stores it globally for the module. It also holds a list of datasets in the deletion queue.
-
dynamo_consistency.getinventorycontents.get_db_listing(site, callback=None, **kwargs)[source]¶ Get the list of files from dynamo database directly from MySQL.
Parameters: site (str) – The name of the site to load Returns: The file replicas that are supposed to be at a site Return type: dynamo_consistency.datatypes.DirectoryInfo
-
dynamo_consistency.getinventorycontents.get_site_inventory(site, callback=None, **kwargs)[source]¶ Loads the contents of a site, based on the dynamo inventory
Parameters: site (str) – The name of the site to load Returns: The file replicas that are supposed to be at a site Return type: dynamo_consistency.datatypes.DirectoryInfo