projit package
Submodules
projit.ascii_plot module
- projit.ascii_plot.arange(beg, end, step)[source]
Utility function to emulate arange from earlier python versions
- projit.ascii_plot.ascii_plot(ydata, xdata=None, logscale=False, pch='o', xlabel='X', ylabel='Y', width=72, height=50)[source]
Generate an ASCII art plot of a set of data points.
This function was taken from the GitHub gist https://gist.github.com/fransua/6165813 It was modified to work with Python 3, provide neater formatting on the tick labels and fix some problems with extreme values being occasionally ommitted.
- Note: Plot title was removed because we will be using it within functions that precede
the calls with their own titles.
- Parameters:
ydata – list of values to be plotted
xdata (None) – x coordinate corresponding to ydata. If None will range between 1 and the length of ydata.
logscale (False) – display data with logarithmic Y axis
pch ('o') – string for points (whatever + = - * etc…)
title ('plot') – string for title of the plot
xlabel ('X') – label for the X axis
ylabel ('Y') – label for the Y axis
width (100) – width in term of characters
height (100) – height in term of characters
- Returns:
string corresponding to plot
projit.cli module
- projit.cli.extract_max_tags_lengths(project, asset, tags)[source]
CLI Internal Function: determine the maximum length of the content inside a specific set of tags on an asset in the project.
- Parameters:
project (Projit, required) – The projit project object
asset (String, required) – The asset type
tags (list(String), required) – The tags to search for
- Returns:
List of tag lengths
- Return type:
list(Int)
- projit.cli.filler(current, max_len, content=' ')[source]
Internal function to fill a string with spaces to max_len
- Parameters:
current (Int, required) – The length of the current content
max_len (Int, required) – The maximum string length
content (Char, optional) – The character to fill with (default ‘ ‘)
- Returns:
filled_content
- Return type:
String
- projit.cli.print_results_latex(title, df)[source]
Latex output - Putting this in a central function in case we change the functionality or format in the future.
- Parameters:
title (String, required) – The table title
df (DataFrame, required) – The dataframe to print out
- Returns:
None
- Return type:
None
- projit.cli.task_add(project, asset, name, path)[source]
Add elements to a project from the command line
- projit.cli.task_compare(project, datasets, metric, format, precision)[source]
CLI Internal Task Function: Compare results across muliple datasets.
This command loads the results for each dataset and extracts just the records for the specified metric to compile the comparison dataset to display.
- Parameters:
project (Projit, required) – The projit project object
datasets (list(String), required) – The list of datasets to compare
metric (String, required) – The metric to use for comparison
format (String, required) – The output format (markdown|latex|default)
precision (Int, required) – The precision for results in the table
- Returns:
None
- Return type:
None
- projit.cli.task_init(name, template='')[source]
CLI Internal Task Function: Initialise a project from the command line. This function will initate a project with a blank description. Users will need to update this in subsequent interation.
- Parameters:
name (String, required) – The name of the project
template (String, optional) – The name of the template to use when initialising
- Returns:
None
- Return type:
None
- projit.cli.task_list(subcmd, project, dataset, format, precision, tags)[source]
CLI Internal Task Function: List content of a project from the command line
- projit.cli.task_render(project, path)[source]
Generates a pdf and writes it to the provided path
- Parameters:
project (Projit, required) – The projit project object
path (String, required) – The rendering path
- projit.cli.task_rm(project, asset, name)[source]
Remove elements to a project from the command line
- projit.cli.task_status(project)[source]
CLI Internal Task Function: Print the project properties to the command line
- Parameters:
project (Projit, required) – The projit project object
- Returns:
None
- Return type:
None
projit.config module
projit.latex_table module
Support function for generating a latex table from a pandas dataframe This function negates the need for additional dependencies
projit.pdf module
projit.projit module
- class projit.projit.Projit(path, name, desc='', experiments=[], datasets={}, results={}, params={}, hyperparams={}, dataresults={}, executions={}, tags={})[source]
Bases:
objectProjit Class. This is a data structure to contain the core elements of a data science project. It will permit loose coupling between processes and experiments but provide a simple overarching structure for communication and documentation.
- add_dataset(name, path)[source]
Add a named dataset to the project.
- Parameters:
name (string, required) – The dataset name
path (string, required) – The path to the data set (either local path, URL or S3 Bucket)
- Returns:
None
- Return type:
None
- add_experiment(name, path)[source]
Add information of a new experiment to the project. Then save the project configuration. This function will overwrite an experiment of the same name and delete any previous results.
- Parameters:
name (string, required) – The experiment name
path (string, required) – The path to the experiment.
- Returns:
None
- Return type:
None
- add_hyperparam(name, value)[source]
Add a set of hyper parameters to the project.
- Parameters:
name (string, required) – The experiment name
value (Dictionary) – The Dictionary of hyperparameters
- Returns:
None
- Return type:
None
- add_param(name, value)[source]
Add a parameter to the project.
- Parameters:
name (string, required) – The parameter name
value (Any) – The value taken by that parameter
- Returns:
None
- Return type:
None
- add_result(experiment, metric, value, dataset=None)[source]
Add results from an experiment to the project.
They can be overall project results, or associated with a specific dataset
- Parameters:
name (string, required) – The experiment name
metric (string, required) – The name of the metric we are adding.
value (float, required) – The value of the metric to add.
dataset (string, optional) – The dataset against which the results are generated
- Returns:
None
- Return type:
None
- add_tags(asset, name, tags)[source]
Add tags to a specific asset
- Parameters:
asset (string, required) – The asset type (experiment|dataset)
name (string, required) – The asset name
tags (Dictionary(string:string)) – The distionary of tags
- Returns:
None
- Return type:
None
- clean_experimental_results(name)[source]
Remove all results for a given experiment
- Parameters:
name (string, required) – The experiment name
- Returns:
None
- Return type:
None
- create_local_path(ds)[source]
Create and return a path to a dataset. Internal use.
- Returns:
Path to dataset
- Return type:
String
- dataset_exists(name)[source]
Check if a given dataset is in the data structure
- Parameters:
name (string, required) – The dataset name
- Returns:
exists
- Return type:
Boolean
- end_experiment(name, id, hyperparams={})[source]
End an experiment execution. This function require both the experiment name and the hash ID of the previously started execution
- Parameters:
name (string, required) – The experiment name (Unique Identifer)
id (string, required) – The execution hash ID returned by the function: start_experiment
hyperparams – Optional dictionary of hyperparameters used in the experiment execution
- Returns:
None
- Return type:
None
- experiment_exists(name)[source]
Check if a given experiment is in the data structure
- Parameters:
name (string, required) – The experiment name
- Returns:
exists
- Return type:
Boolean
- get_dataset(name)[source]
Retrieve the dataset by name.
- Parameters:
name (string, required) – The dataset to retrieve
- Returns:
Path to dataset
- Return type:
String
- get_execution_times(name)[source]
Given an experiment name Return an list of all execution times
- Parameters:
name (string, required) – The experiment name (Unique Identifer)
- Returns:
execution_times : Array of execution times
- Return type:
list(float)
- get_experiment_execution_stats(name)[source]
Given an experiment name Return the execution statistics
- Parameters:
name (string, required) – The experiment name (Unique Identifer)
- Returns:
executions, mean_execution_time : A pair of statistics
- Return type:
int, float
- get_mean_execution_time(name)[source]
Given an experiment name Return the mean execution time
- Parameters:
name (string, required) – The experiment name (Unique Identifer)
- Returns:
mean_execution_time : The mean time of execution
- Return type:
float
- get_results(dataset=None)[source]
Retrieve the experimental results as a DataFrame.
They can be overall project results, or associated with a specific dataset
- Parameters:
dataset (string, optional) – The dataset against which the results are generated
- Returns:
DataFrame of results
- Return type:
pandas.DataFrame
- get_root_path()[source]
Get the path to where the project folder is located
- Returns:
path : The Path to the Project folder
- Return type:
String
- get_tags(asset, name, tags)[source]
Retrive specified tags to a specific asset Returns the list of tag values in the same order as requested.
- Parameters:
asset (string, required) – The asset type (experiment|dataset)
name (string, required) – The asset name
tags (list(string)) – The list of tags
- Returns:
tags
- Return type:
list(string)
- initiate_lock()[source]
Lock files are used during processes that modify the project so that we get consistent state across parallel executions.
- Returns:
None
- Return type:
None
- release_lock()[source]
Lock files are used during processes that modify the project so that we get consistent state across parallel executions. Release the lock by deleting the lock file
- Returns:
None
- Return type:
None
- reload()[source]
Reload the project meta-data from disk. - Necessary when multiple processes are running experiments in the same project and we want to avoid overwriting data.
- Returns:
None
- Return type:
None
- render(path)[source]
Render the project data into a PDF file
- Parameters:
path (string, required) – The path to write the PF to
- Returns:
None
- Return type:
None
- rm_dataset(name)[source]
Remove a named dataset to the project.
- Parameters:
name – The dataset name (or ‘.’ for all datasets)
- Returns:
None
- Return type:
None
- rm_experiment(name)[source]
Remove a named experiment from the project.
- Parameters:
name – The experiment name (or ‘.’ for all experiments)
- Returns:
None
- Return type:
None
- save()[source]
Save your projit project into config files within the projit config dir
- Returns:
None
- Return type:
None
- start_experiment(name, path, params={}, tags={})[source]
Start an experiment execution. This function will create a new experiment if this is the first execution otherwise it will simply add a new execution record. Function returns an unique identifer for the execution: required to end the execution in a call to
projit.Projit.end_experiment()- Parameters:
name (string, required) – The experiment name (Unique Identifer)
path (string, required) – The path to the experiment script being executed
params (Dictionary, optional) – Optional dictionary of parameters used in the experiment execution
tags (Dictionary, optional) – Optional dictionary of tags to describe the experiment
- Returns:
id : The Execution ID
- Return type:
String
- projit.projit.init(template, name, desc='')[source]
Initialise a new projit project. Create the config directory and write the project config there.
- Parameters:
name (string, required) – The name of the project
desc (string, required) – The project description
- Returns:
Projit Object
- Return type:
- projit.projit.load(config_path)[source]
This function allows you to instantiate a Projit project from an existing config_path The config path must contain the required config file that contains the required fields.
Note: This function will always overwrite the path variable in the object so the instance is aware of where it is relative to the config directory.
- Parameters:
config_path (string, required) – The path to the projit configuration
- Returns:
Projit Object
- Return type:
projit.template module
- projit.template.end_profile(proc_name)[source]
End the profiling of a named process
- Returns:
None
- Return type:
None
- projit.template.eprint(*args, **kwargs)[source]
Utility internal function for easy printing of messages to STDERR
- Parameters:
args (list(string), required) – List of strings to print
kwargs (dictionary(String:String), required) – Keyword arguments for print function
- Returns:
None
- Return type:
None
- projit.template.initialise_profile()[source]
Initialise the profiles
- Returns:
None
- Return type:
None
- projit.template.load_template(filename)[source]
Utility function to load a project template from a file
- projit.template.padded(k, padto=20)[source]
Internal utility function to pad a string
- Parameters:
k (String, required) – The String of characters to pad out
padto (Int, optional) – The number of characters to pad out to
- Returns:
padded_string
- Return type:
String
projit.utils module
- projit.utils.create_properties(project_name, descrip)[source]
Create an initial properties Dictionary for project config
- Parameters:
project_name (String, required) – The project name
descrip (String, required) – The description of the project
- Returns:
The project config object
- Return type:
Dictionary(String:String)
- projit.utils.get_data_config(pathway)[source]
- Internal utility function for getting path to meta-data file
contain datasets.
- Returns:
Path
- Return type:
String
- projit.utils.get_experiments(pathway)[source]
- Internal utility function for getting path to meta-data file
containing experiments and exections.
- Returns:
Path
- Return type:
String
- projit.utils.get_properties(pathway)[source]
Get the properties file
- Parameters:
pathway (String, required) – Path to the file location name
- Returns:
The project config object
- Return type:
Dictionary(String:String)
- projit.utils.initialise_project(name, descrip)[source]
Intialise the project
- Parameters:
name (String, required) – The project name
descrip (String, required) – The description of the project
- Returns:
None
- Return type:
None
- projit.utils.locate_projit_config()[source]
Find a path to a projit project config, or return empty string. Required so that commands run against a project can quickly locate the configuration.
- Returns:
path : The Path to the projit Project folder
- Return type:
String
- projit.utils.open_config(filename)[source]
Internal utility function for getting config object
- Parameters:
name (String, required) – The filename to open
- Returns:
config
- Return type:
Dictionary
- projit.utils.walk_up(bottom)[source]
Function to mimic os.walk, but walk ‘up’ instead of down the directory tree
- Parameters:
bottom (String, required) – The path to the bottom of the directory tree.
- Returns:
An iterator over strings for all paths
- Return type:
Iterator(String)