Front End Reference¶
A simple consistency check on a site can be done by doing the following
with the Python API when an instance of dynamo
is installed:
from dynamo_consistency import config, datatypes, remotelister, inventorylister
config.LOCATION = '/path/to/config.json'
site = 'T2_US_MIT' # For example
inventory_listing = inventorylister.listing(site)
remote_listing = remotelister.listing(site)
datatypes.compare(inventory_listing, remote_listing, 'results')
In this example,
the list of file LFNs in the inventory and not at the site will be in results_missing.txt
.
The list of file LFNs at the site and not in the inventory will be in results_orphan.txt
.
The listing
functions can be re-implemented to perform the check desired.
This is detailed more in Back End Requirements.
The following is a full reference to the submodules
directly inside of the dynamo_consistency
module.
These are the modules indended to be interacted with by a typical user.
To see more details about the backend connections to dynamo
and remote sites, see Back End Requirements.
cache.py¶
-
dynamo_consistency.cache.
cache_tree
(config_age, location_suffix)[source]¶ A decorator for caching pickle files based on the configuration file. It is currently set up to decorate a function that has a single parameter
site
.The returned function also can be passed keyword arguments to override the
location_suffix
argument. This is done with thecache
argument to the function.Parameters: Returns: A function that uses the caching configuration
Return type: func
config.py¶
Small module to get information from the config.
author: | Daniel Abercrombie <dabercro@mit.edu> |
---|
-
dynamo_consistency.config.
DIRECTORYLIST
= None¶ If this is set to a list of directories, it overrides the
DirectoryList
set in the configuration file. This prevents the tool from attempting to list directories that are not there.
-
dynamo_consistency.config.
LOADER
= <module 'json' from '/home/docs/.pyenv/versions/2.7.16/lib/python2.7/json/__init__.pyc'>¶ A module that uses the load function on a file descriptor to return a dictionary. (Examples are the
json
andyaml
modules.) If yourLOCATION
is not a JSON file, you’ll want to change this also before callingconfig_dict()
.
-
dynamo_consistency.config.
LOCATION
= 'consistency_config.json'¶ The string giving the location of the configuration JSON file. Generally, you want to set this value of the module before calling
config_dict()
to get your configuration.
-
dynamo_consistency.config.
SITE
= None¶ A global place that stores a site that has been picked. Set in
dynamo_consistency.picker.pick_site()
.
create.py¶
Module that contains all of the threading functions needed to create a
dynamo_consistency.datatypes.DirectoryInfo
object.
author: | Daniel Abercrombie <dabercro@mit.edu> |
---|
-
class
dynamo_consistency.create.
ListingThread
(number, recv, send, filler)[source]¶ A thread that does the listing
Parameters: - number (int) – A unique number of the thread created. This is used to generate a thread name and to fetch a logger.
- recv (multiprocessing.Pipe) – One end of a pipe for communicating with the master thread. Receives message here.
- send (multiprocessing.Queue) – A queue to send message to the master thread.
- filler (function) –
The function that takes a path as an argument and returns:
- A bool saying whether the listing was successful or not
- A list of tuples of sub-directories and their mod times
- A list of tuples files inside, their size, and their mode times
-
out_queue
= <multiprocessing.queues.Queue object>¶ A queue where all of the listing outputs are placed. A master thread is expected to read from and clear this.
-
static
put_first_dir
(location, directory)[source]¶ Place the first set of parameters for the
ListingThread
objects to start from. Should not be called once the threads are started.Parameters: - location (str) – This is the beginning of the path where we will find
first_dir
. For example, to find the first directorymc
, we also have to say where it is. For using CMS LFNs, location would be/store
(wheremc
is inside). - directory (str) – Name of the first directory to run over inside of location
Returns: The name that the first DirectoryInfo object should be. This is just the first directory in the directory parameter.
Return type: - location (str) – This is the beginning of the path where we will find
-
dynamo_consistency.create.
create_dirinfo
(location, first_dir, filler, object_params=None, callback=None)[source]¶ Create the directory information
Parameters: - location (str) – This is the beginning of the path where we will find
first_dir
. For example, to find the first directorymc
, we also have to say where it is. For using CMS LFNs, location would be/store
(wheremc
is inside). This is a path. - first_dir (str) – The name of the first directory that is inside the path of
location
. This should not be a path, but the name of the directory to list recursively. - filler (function or constructor) –
This is either a function that lists the directory contents given just a path of
os.path.join(location, first_dir)
, or it is a constructor that does the same thing with a member function calledlist
. Iffiller
is an object constructor, the parameters for the object creation must be passed through the parameterobject_params
. Both listings must return the following tuple:- A bool saying whether the listing was successful or not
- A list of tuples of sub-directories and their mod times
- A list of tuples files inside, their size, and their mode times
- object_params (list) – This only needs to be set when filler is an object constructor. Each element in the list is a tuple of arguments to pass to the constructor.
- callback (function) – A function that is called every time master thread has finished checking the child threads. This can happen very many times at large sites. The function is called with the main DirectoryTree as its argument
Returns: A
DirectoryInfo
object containing everything the directory listings fromos.path.join(location, first_dir)
with namefirst_dir
.Return type: Raises: messaging.Killed – When a site has been stopped
- location (str) – This is the beginning of the path where we will find
datatypes.py¶
Module defines the datatypes that are used for storage and comparison. There is also a powerful create_dirinfo function that takes a filler function or object and uses the multiprocessing module to recursively list directories in parallel.
author: | Daniel Abercrombie <dabercro@mit.edu> |
---|
-
exception
dynamo_consistency.datatypes.
BadPath
[source]¶ An exception for throwing when the path doesn’t make sense for various methods of a
DirectoryInfo
-
class
dynamo_consistency.datatypes.
DirectoryInfo
(name='', directories=None, files=None)[source]¶ Stores all of the information of the contents of a directory
Parameters: - name (str) – The name of the directory
- directories (list) – If this is set, the infos in the
list are merged into a master
DirectoryInfo
. - files (list) – List of tuples containing information about files in the directory.
-
add_file_list
(file_infos)[source]¶ Add a list of tuples containing file_name, file_size to the node. This is most useful when you get a list of files from some other source and want to easily convert that list into a
DirectoryInfo()
Parameters: file_infos (list) – The list of files (full path, size in bytes[, timestamp])
-
add_files
(files)[source]¶ Set the files for this
DirectoryInfo
nodeParameters: files (list) – The tuples of file information. Each element consists of file name, size, and mod time. Returns: self for chaining calls Return type: DirectoryInfo
-
compare
(other, path='', check=None)[source]¶ Does one way comparison with a different tree
Parameters: - other (DirectoryInfo) – The directory tree to compare this one to
- path (str) – Is the path to get to this location so far
- check (function) – An optional function that double checks a file name.
If the checking function returns
True
for a file name, the file will not be included in the output.
Returns: Tuple of list of files and directories that are present and not in the other tree and the size of the files that corresponds to
Return type:
-
count_nodes
(empty=False)[source]¶ Parameters: empty (bool) – If True, only return the number of empty nodes Returns: The total number of nodes in this Directory Info. This corresponds to approximately the number of listing requests required to build the data. Return type: int
-
display
(path='')[source]¶ Print out the contents of this
DirectoryInfo
Parameters: path (str) – The full path to this DirectoryInfo
instance
-
displays
(path='')[source]¶ Get the string to print out the contents of this
DirectoryInfo
.Parameters: path (str) – The full path to this DirectoryInfo
instanceReturns: The display string Return type: str
-
empty_nodes_list
()[source]¶ This function should be used to get the nodes to delete in the proper order for non-recursive deletion
Returns: The list of empty directories to delete in the order to delete Return type: list
-
empty_nodes_set
()[source]¶ This function recursively builds the entire list of empty directories that can be deleted
Returns: The set of empty directories to delete Return type: set
-
get_directory_size
()[source]¶ Report the total size used by this directory and its subdirectories.
Returns: Size of files in directory, in bytes Return type: int
-
get_file
(file_name)[source]¶ Get the file dictionary based off the name.
Parameters: file_name (str) – The LFN of the file Returns: Dictionary of file information Return type: dict Raises: BadPath – if the file_name does not start with self.name
-
get_files
(min_age=0, path='')[source]¶ Get the list of files that are older than some age
Parameters: Returns: List of full file paths
Return type:
-
get_node
(path, make_new=True)[source]¶ Get the node that corresponds to the path given. If the node does not exist yet, and
make_new
is True, the node is created.Parameters: Returns: A node with the proper path, unless make_new is False and the node doesn’t exist
Return type:
-
get_num_files
(unlisted=False, place_new=False)[source]¶ Report the total number of files stored.
Parameters: - unlisted (bool) – If true, return number of unlisted directories, Otherwise return only successfully listed files
- place_new (bool) – If true, pretend there’s one more file inside any new directory or if files is None. This prevents listing of empty directories to include directories that should not actually be deleted.
Returns: The number of files in the directory tree structure
Return type:
-
get_unlisted
(path='')[source]¶ Parameters: path (str) – Path to prepend to the name, used in recursive calls Returns: List of directories that were unlisted Return type: list
-
listdir
(*args, **kwargs)[source]¶ Get the list of directory names within a
DirectoryInfo
. Adding an argument will display the contents of the next directory. For example, ifdir.listdir()
returns:0: data 1: mc
dir.listdir(1)
then lists the contents ofmc
anddir.listdir(1, 0)
lists the contents of the first subdirectory inmc
.Parameters: - args – Is a list of indices to list the subdirectories
- kwargs – Supports ‘printing’ which is set to a bool. Defaults as True.
Returns: The
DirectoryInfo
that is being listedReturn type:
-
remove_node
(path_name)[source]¶ Remove an empty node from the DirectoryInfo
Parameters: path_name (str) – The path to the node, including the
self.name
at the beginningReturns: self for chaining
Return type: Raises:
-
save
(file_name)[source]¶ Save this
DirectoryInfo
in a file.Parameters: file_name (str) – is the location to save the file
-
setup_hash
()[source]¶ Set the hashes for this
DirectoryInfo
-
exception
dynamo_consistency.datatypes.
NotEmpty
[source]¶ An exception for throwing when a non-empty directory is deleted from a
DirectoryInfo
-
dynamo_consistency.datatypes.
compare
(inventory, listing, output_base=None, orphan_check=None, missing_check=None)[source]¶ Compare two different trees and output the differences into an ASCII file
Parameters: - inventory (DirectoryInfo) – The tree of files that should be at a site
- listing (DirectoryInfo) – The tree of files that are listed remotely
- output_base (str) – The names of the ASCII files to place the reports are generated from this variable.
- orphan_check (function) – A function that double checks each expected orphan. The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as an orphan.
- missing_check (function) – A function checks each expected missing file The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as missing.
Returns: The two lists, missing and orphan files
Return type:
-
dynamo_consistency.datatypes.
get_info
(file_name)[source]¶ Get the
DirectoryInfo
from a file.Parameters: file_name (str) – is the location of the saved information Returns: Saved info Return type: DirectoryInfo
emptyremover.py¶
Defines a class that can remove nodes from a DirectoryInfo
object
-
class
dynamo_consistency.emptyremover.
EmptyRemover
(site, check=None)[source]¶ This class handles the removal of empty directories from the tree by behaving as a callback. It also calls deletions for the registry at the same time.
Parameters: - site (str) – Site name. If value is
None
, then don’t enter deletions into the registry, but still remove node from tree - check (function) – The function to check against orphans to not delete.
The full path name is passed to the function.
If it returns
True
, the directory is not deleted.
- site (str) – Site name. If value is
filters.py¶
Defines tools for filtering file names out of the compare result
-
class
dynamo_consistency.filters.
Filters
(*args)[source]¶ Holds multiple functions for filtering out file names from the
dynamo_consistency.datatypes.compare()
function.Parameters: args – An optional list of functions to build the filter
-
class
dynamo_consistency.filters.
PatternFilter
(patterns)[source]¶ This tells if the named file contains one of the ignored patterns. These are just checked to see that the file name contains one of the listed strings. There’s no regex in here.
Parameters: patterns (list) – List of “patterns” to check.
history.py¶
Handles the invalidation of files through a separate read-write process
-
class
dynamo_consistency.history.
LockedConn
[source]¶ Similar to
dynamo_consistency.summary.LockedConn
We want to handle the history database here though
-
dynamo_consistency.history.
empty_directories
(site, acting=False)[source]¶ Get the list of empty directories. If acting on them, the directories are moved into the history database.
Parameters: Returns: The directory list
Return type:
-
dynamo_consistency.history.
finish_run
()[source]¶ Called in
dynamo_consistency.main.main()
to register the end of a consistency run
-
dynamo_consistency.history.
missing_files
(site, acting=False)[source]¶ Get the missing files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.
Parameters: Returns: The LFNs that were missing
Return type:
-
dynamo_consistency.history.
orphan_files
(site, acting=False)[source]¶ Get the orphan files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.
Parameters: Returns: The LFNs that were orphan
Return type:
-
dynamo_consistency.history.
report_empty
(directories)[source]¶ Adds emtpy directories to history database
Parameters: directories (list) – A list of directory names and mtime (in seconds)
-
dynamo_consistency.history.
report_missing
(missing)[source]¶ Stores a list of missing files in the invalidation table
Parameters: missing (list) – A list of tuples, where each tuple is a name, info dict pair
-
dynamo_consistency.history.
report_orphan
(orphan)[source]¶ Stores a list of orphan files in the orphan table
Parameters: orphan (list) – A list of tuples, where each tuple is a name, info dict pair
-
dynamo_consistency.history.
report_unmerged
(unmerged)[source]¶ Stores a list of deletable unmerged files in the orphan table
Parameters: unmerged (list) – A list of tuples, where each tuple is a name, info dict pair
-
dynamo_consistency.history.
start_run
()[source]¶ Called in
dynamo_consistency.main.main()
to register the start of a consistency run
-
dynamo_consistency.history.
unmerged_files
(site, acting=False)[source]¶ Get the deletable unmerged files from the consistency database. If the caller identifies itself as acting on the list, the list is moved into the history with the acted flag True.
Parameters: Returns: The LFNs in unmerged that are deletable
Return type:
inventorylister.py¶
This module gets the information from the inventory about a site’s contents
author: | Daniel Abercrombie <dabercro@mit.edu> |
---|
-
dynamo_consistency.inventorylister.
filter_files
(site, pathstrip)[source]¶ Gets the files from the inventory and filters them through the configuration’s DirectoryList
Parameters: Returns: Tuples for adding to
dynamo_consistency.datatypes.DirectoryInfo.add_file_list()
Return type: generator
-
dynamo_consistency.inventorylister.
listing
(site, callback=None, **kwargs)[source]¶ Get the list of files from the inventory.
Parameters: site (str) – The name of the site to load Returns: The file replicas that are supposed to be at a site Return type: dynamo_consistency.datatypes.DirectoryInfo
logsetup.py¶
The module that sets up logging for us
-
dynamo_consistency.logsetup.
change_logfile
(*filenames)[source]¶ Changes the output file of all of the loggers. Creates any directories that are needed to hold the logs.
Parameters: filenames – The files to write new logs to
-
dynamo_consistency.logsetup.
match_logs
(source, targets)[source]¶ Parameters: - source (logging.Logger) – Logger that has handlers to use
- targets (list) – List of loggers that need handlers updated
main.py¶
Holds the main function for running the consistency check
-
dynamo_consistency.main.
compare_with_inventory
(site)[source]¶ Gets the listing from the dynamo database, and remote XRootD listings of a given site. The differences are compared to deletion queues and other things.
Parameters: site (str) – The site to run the check over Returns: Start time of the running and a dictionary of parameters to report to the summary webpage. See summary.update_summary()
parameters for returned keys.Return type: float, dict
-
dynamo_consistency.main.
extras
(site)[source]¶ Runs a bunch of functions after the main consistency check, depending on the presence of certain arguments and configuration
Parameters: site (str) – For use to pass to extras Returns: Dictionary with interesting results. Keys include the following: "unmerged"
- A tuple listing unmerged files removed and unmerged logs
Return type: dict
-
dynamo_consistency.main.
main
(site)[source]¶ Runs comparison, and extras based on command line. Updates the summary table for normal runs.
Parameters: site (str) – Site to run over
-
dynamo_consistency.main.
make_filters
(site)[source]¶ Creates filters proper for running environment and options
Parameters: site (str) – Site to get activity at Returns: Three filters.Filter
objects that can be used to check orphans, missing files, and ignored directories respectivelyReturn type: filters.Filter
,filters.Filter
,filters.PatternFilter
-
dynamo_consistency.main.
report_files
(inv, remote, missing, orphans, prev_set=None)[source]¶ Reports files to the history database. If
prev_set
is given, only missing files that also appear in this set will be invalidated.Parameters: - inv (dynamo_consistency.datatypes.DirectoryInfo) – The inventory listing
- remote (dynamo_consistency.datatypes.DirectoryInfo) – The remote listing
- missing (list) – Missing files
- orphans (list) – Orphan files
- prev_set (set) – Set of files that were missing in the previous run
messaging.py¶
A module for handling messages
parser.py¶
Module that parses the command line for dynamo-consistency
picker.py¶
The bit of the summary table that also relies on accurate backend.
-
dynamo_consistency.picker.
pick_site
(pattern=None, lockname=None)[source]¶ This function also does the task of syncronizing the summary database with the inventory’s list of sites that match the pattern.
Parameters: Returns: The name of a site that is ready and hasn’t run in the longest time
Return type: Raises: NoMatchingSite – If no site matches or is ready
remotelister.py¶
Tool to get the files located at a site.
author: | Daniel Abercrombie <dabercro@mit.edu> Max Goncharov <maxi@mit.edu> |
---|
-
dynamo_consistency.remotelister.
listing
(site, callback=None, **kwargs)[source]¶ Get the information for a site, from XRootD or a cache.
Parameters: - site (str) – The site name
- callback (function) – The callback function to pass to
create.create_dirinfo()
Returns: The site directory listing information
Return type:
signaling.py¶
A small module for handling signals
summary.py¶
Module that handles the summary database and webpage. It will install the summary webpage for you the first time you run the consistency check.
-
exception
dynamo_consistency.summary.
BadAction
[source]¶ For raising one of the following actions wasn’t identifed properly
-
class
dynamo_consistency.summary.
LockedConn
[source]¶ Holds a connection to the summary database. Includes fh locking itself so that we don’t crash over that
-
exception
dynamo_consistency.summary.
NoMatchingSite
[source]¶ For raising when consistency doesn’t know what site to run on
-
dynamo_consistency.summary.
do_update
()[source]¶ Determines if running under conditions where the summary table should be updated
Returns: True if the update should happen Return type: bool
-
dynamo_consistency.summary.
get_dst
()[source]¶ Returns: 1 for daylight savings time, 0 otherwise, -1 if unsure Return type: int
-
dynamo_consistency.summary.
get_sites
(reporting=False)[source]¶ Parameters: reporting (bool) – If true, only get sites that should be reported to dynamo Returns: The list of sites that are currently in the database Return type: list
-
dynamo_consistency.summary.
get_status
(site)[source]¶ Returns: Running status of a site Return type: int Raises: NoMatchingSite – If no matching site is in the database
-
dynamo_consistency.summary.
install_webpage
()[source]¶ Installs files for webpage in configured Web_Dir
-
dynamo_consistency.summary.
is_debugged
(site)[source]¶ Returns: If the site is cleared for acting on consistency results Return type: bool
-
dynamo_consistency.summary.
move_local_files
(site)[source]¶ Move files in the working directory to the web page
Parameters: site (str) – The site which has files ready to move
-
dynamo_consistency.summary.
running
(site)[source]¶ Show the site as running on the web page and note the start time
Parameters: site (str) – Site to run
-
dynamo_consistency.summary.
set_reporting
(site, status)[source]¶ Sets the reporint status of a site.
Parameters: Raises: BadAction – If the status doesn’t make sense
-
dynamo_consistency.summary.
set_status
(site, status)[source]¶ Sets the run status of a site.
Parameters: Raises: BadAction – If the status doesn’t make sense
-
dynamo_consistency.summary.
unlock_site
(site)[source]¶ Sets the site running status back to 0 if running
Parameters: site (str) – Site to unlock
-
dynamo_consistency.summary.
update_config
()[source]¶ Updates the configuration file at the summary website
-
dynamo_consistency.summary.
update_summary
(site, duration, numfiles, numnodes, numempty, nummissing, missingsize, numorphan, orphansize, numnosource, numunrecoverable, numunlisted, numbadunlisted, numunmerged=0, numlogs=0)[source]¶ Update the summary webpage.
Parameters: - site (str) – The site to update the summary for
- duration (float) – The amount of time it took to run, in seconds
- numfiles (int) – Number of files in the tree
- numnodes (int) – Number of directories listed
- numempty (int) – Number of empty directories to delete
- nummissing (int) – Number of missing files
- missingsize (int) – Size of missing files, in bytes
- numorphan (int) – Number of orphan files
- orphansize (int) – Size of orphan files, in bytes
- numnosource (int) – The number of missing files that are on no other disk
- numunrecoverable (int) – The number of missing files that are not on disk or tape
- numunlisted (int) – Number of directories that were not listed
- numbadunlisted (int) – Number of unlisted directories that were not listed due to error
- numunmerged (int) – Number of files to remove from unmerged (CMS only)
- numlogs (int) – Number of unmerged files that were logs (CMS only)
Returns: True if the summary table was updated
Return type: