projit package

Submodules

projit.ascii_plot module

projit.ascii_plot.arange(beg, end, step)[source]: Utility function to emulate arange from earlier python versions

projit.ascii_plot.ascii_plot(ydata, xdata=None, logscale=False, pch='o', xlabel='X', ylabel='Y', width=72, height=50)[source]

Generate an ASCII art plot of a set of data points.

This function was taken from the GitHub gist https://gist.github.com/fransua/6165813 It was modified to work with Python 3, provide neater formatting on the tick labels and fix some problems with extreme values being occasionally ommitted.

Note: Plot title was removed because we will be using it within functions that precede: the calls with their own titles.

Parameters:

ydata – list of values to be plotted
xdata (None) – x coordinate corresponding to ydata. If None will range between 1 and the length of ydata.
logscale (False) – display data with logarithmic Y axis
pch ('o') – string for points (whatever + = - * etc…)
title ('plot') – string for title of the plot
xlabel ('X') – label for the X axis
ylabel ('Y') – label for the Y axis
width (100) – width in term of characters
height (100) – height in term of characters

Returns:

string corresponding to plot

projit.cli module

projit.cli.cli_main()[source]

projit.cli.extract_max_tags_lengths(project, asset, tags)[source]

CLI Internal Function: determine the maximum length of the content inside a specific set of tags on an asset in the project.

Parameters:

project (Projit, required) – The projit project object
asset (String, required) – The asset type
tags (list(String), required) – The tags to search for

Returns:

List of tag lengths

Return type:

list(Int)

projit.cli.filler(current, max_len, content=' ')[source]

Internal function to fill a string with spaces to max_len

Parameters:

current (Int, required) – The length of the current content
max_len (Int, required) – The maximum string length
content (Char, optional) – The character to fill with (default ‘ ‘)

Returns:

filled_content

Return type:

String

projit.cli.main()[source]

projit.cli.print_header(header)[source]

projit.cli.print_results_latex(title, df)[source]

Latex output - Putting this in a central function in case we change the functionality or format in the future.

Parameters:

title (String, required) – The table title
df (DataFrame, required) – The dataframe to print out

Returns:

None

Return type:

None

projit.cli.print_results_markdown(title, df)[source]

projit.cli.print_usage(prog)[source]: Command line application usage instrutions.

projit.cli.task_add(project, asset, name, path)[source]: Add elements to a project from the command line

projit.cli.task_compare(project, datasets, metric, format, precision)[source]

CLI Internal Task Function: Compare results across muliple datasets.

This command loads the results for each dataset and extracts just the records for the specified metric to compile the comparison dataset to display.

Parameters:

project (Projit, required) – The projit project object
datasets (list(String), required) – The list of datasets to compare
metric (String, required) – The metric to use for comparison
format (String, required) – The output format (markdown|latex|default)
precision (Int, required) – The precision for results in the table

Returns:

None

Return type:

None

projit.cli.task_init(name, template='')[source]

CLI Internal Task Function: Initialise a project from the command line. This function will initate a project with a blank description. Users will need to update this in subsequent interation.

Parameters:

name (String, required) – The name of the project
template (String, optional) – The name of the template to use when initialising

Returns:

None

Return type:

None

projit.cli.task_list(subcmd, project, dataset, format, precision, tags)[source]: CLI Internal Task Function: List content of a project from the command line

projit.cli.task_plot(project, experiment, property, metric)[source]

projit.cli.task_render(project, path)[source]

Generates a pdf and writes it to the provided path

Parameters:

project (Projit, required) – The projit project object
path (String, required) – The rendering path

projit.cli.task_rm(project, asset, name)[source]: Remove elements to a project from the command line

projit.cli.task_status(project)[source]

CLI Internal Task Function: Print the project properties to the command line

Parameters:: project (Projit, required) – The projit project object
Returns:: None
Return type:: None

projit.cli.task_tag(project, asset, name, values)[source]: Add tags to an asset in the project from the command line

projit.cli.task_update(project)[source]

CLI Internal Task Function: Update a project from the command line

This function invokes an interaction via the terminal to update the project properties.

Returns:: None
Return type:: None

projit.config module

projit.latex_table module

Support function for generating a latex table from a pandas dataframe This function negates the need for additional dependencies

projit.latex_table.clean_data_for_latex(input)[source]

This utility function is required because some strings might contain LaTeX special: characters, and therefor need to be escaped before latex rendering will function.

projit.latex_table.print_latex(df, title)[source]

projit.pdf module

class projit.pdf.PDF(orientation='P', unit='mm', format='A4')[source]

Bases: FPDF

add_description(description)[source]

add_title(title)[source]

setup()[source]

projit.projit module

class projit.projit.Projit(path, name, desc='', experiments=[], datasets={}, results={}, params={}, hyperparams={}, dataresults={}, executions={}, tags={})[source]

Bases: object

Projit Class. This is a data structure to contain the core elements of a data science project. It will permit loose coupling between processes and experiments but provide a simple overarching structure for communication and documentation.

add_dataset(name, path)[source]

Add a named dataset to the project.

Parameters:

name (string, required) – The dataset name
path (string, required) – The path to the data set (either local path, URL or S3 Bucket)

Returns:

None

Return type:

None

add_experiment(name, path)[source]

Add information of a new experiment to the project. Then save the project configuration. This function will overwrite an experiment of the same name and delete any previous results.

Parameters:

name (string, required) – The experiment name
path (string, required) – The path to the experiment.

Returns:

None

Return type:

None

add_hyperparam(name, value)[source]

Add a set of hyper parameters to the project.

Parameters:

name (string, required) – The experiment name
value (Dictionary) – The Dictionary of hyperparameters

Returns:

None

Return type:

None

add_param(name, value)[source]

Add a parameter to the project.

Parameters:

name (string, required) – The parameter name
value (Any) – The value taken by that parameter

Returns:

None

Return type:

None

add_result(experiment, metric, value, dataset=None)[source]

Add results from an experiment to the project.

They can be overall project results, or associated with a specific dataset

Parameters:

name (string, required) – The experiment name
metric (string, required) – The name of the metric we are adding.
value (float, required) – The value of the metric to add.
dataset (string, optional) – The dataset against which the results are generated

Returns:

None

Return type:

None

add_tags(asset, name, tags)[source]

Add tags to a specific asset

Parameters:

asset (string, required) – The asset type (experiment|dataset)
name (string, required) – The asset name
tags (Dictionary(string:string)) – The distionary of tags

Returns:

None

Return type:

None

clean_experimental_results(name)[source]

Remove all results for a given experiment

Parameters:: name (string, required) – The experiment name
Returns:: None
Return type:: None

create_local_path(ds)[source]

Create and return a path to a dataset. Internal use.

Returns:: Path to dataset
Return type:: String

dataset_exists(name)[source]

Check if a given dataset is in the data structure

Parameters:: name (string, required) – The dataset name
Returns:: exists
Return type:: Boolean

end_experiment(name, id, hyperparams={})[source]

End an experiment execution. This function require both the experiment name and the hash ID of the previously started execution

Parameters:

name (string, required) – The experiment name (Unique Identifer)
id (string, required) – The execution hash ID returned by the function: start_experiment
hyperparams – Optional dictionary of hyperparameters used in the experiment execution

Returns:

None

Return type:

None

experiment_exists(name)[source]

Check if a given experiment is in the data structure

Parameters:: name (string, required) – The experiment name
Returns:: exists
Return type:: Boolean

get_dataset(name)[source]

Retrieve the dataset by name.

Parameters:: name (string, required) – The dataset to retrieve
Returns:: Path to dataset
Return type:: String

get_execution_times(name)[source]

Given an experiment name Return an list of all execution times

Parameters:: name (string, required) – The experiment name (Unique Identifer)
Returns:: execution_times : Array of execution times
Return type:: list(float)

get_experiment_execution_stats(name)[source]

Given an experiment name Return the execution statistics

Parameters:: name (string, required) – The experiment name (Unique Identifer)
Returns:: executions, mean_execution_time : A pair of statistics
Return type:: int, float

get_hyperparam(name)[source]

get_mean_execution_time(name)[source]

Given an experiment name Return the mean execution time

Parameters:: name (string, required) – The experiment name (Unique Identifer)
Returns:: mean_execution_time : The mean time of execution
Return type:: float

get_param(name)[source]

get_path_to_dataset(name)[source]

get_results(dataset=None)[source]

Retrieve the experimental results as a DataFrame.

They can be overall project results, or associated with a specific dataset

Parameters:: dataset (string, optional) – The dataset against which the results are generated
Returns:: DataFrame of results
Return type:: pandas.DataFrame

get_root_path()[source]

Get the path to where the project folder is located

Returns:: path : The Path to the Project folder
Return type:: String

get_tags(asset, name, tags)[source]

Retrive specified tags to a specific asset Returns the list of tag values in the same order as requested.

Parameters:

asset (string, required) – The asset type (experiment|dataset)
name (string, required) – The asset name
tags (list(string)) – The list of tags

Returns:

projit.template module

projit.template.end_profile(proc_name)[source]

End the profiling of a named process

Returns:: None
Return type:: None

projit.template.eprint(*args, **kwargs)[source]

Utility internal function for easy printing of messages to STDERR

Parameters:

args (list(string), required) – List of strings to print
kwargs (dictionary(String:String), required) – Keyword arguments for print function

Returns:

None

Return type:

None

projit.template.initialise_profile()[source]

Initialise the profiles

Returns:: None
Return type:: None

projit.template.load_template(filename)[source]: Utility function to load a project template from a file

projit.template.padded(k, padto=20)[source]

Internal utility function to pad a string

Parameters:

k (String, required) – The String of characters to pad out
padto (Int, optional) – The number of characters to pad out to

Returns:

padded_string

Return type:

String

projit.template.print_profiles()[source]

Print the result of the profiling of processes

Returns:: None
Return type:: None

projit.template.start_profile(proc_name)[source]

Start the profile of named process

Returns:: None
Return type:: None

projit.utils module

projit.utils.create_properties(project_name, descrip)[source]

Create an initial properties Dictionary for project config

Parameters:

project_name (String, required) – The project name
descrip (String, required) – The description of the project

Returns:

The project config object

Return type:

Dictionary(String:String)

projit.utils.get_data_config(pathway)[source]

Internal utility function for getting path to meta-data file: contain datasets.

Returns:: Path
Return type:: String

projit.utils.get_experiments(pathway)[source]

Internal utility function for getting path to meta-data file: containing experiments and exections.

Returns:: Path
Return type:: String

projit.utils.get_properties(pathway)[source]

Get the properties file

Parameters:: pathway (String, required) – Path to the file location name
Returns:: The project config object
Return type:: Dictionary(String:String)

projit.utils.initialise_project(name, descrip)[source]

Intialise the project

Parameters:

name (String, required) – The project name
descrip (String, required) – The description of the project

Returns:

None

Return type:

None

projit.utils.locate_projit_config()[source]

Find a path to a projit project config, or return empty string. Required so that commands run against a project can quickly locate the configuration.

Returns:: path : The Path to the projit Project folder
Return type:: String

projit.utils.open_config(filename)[source]

Internal utility function for getting config object

Parameters:: name (String, required) – The filename to open
Returns:: config
Return type:: Dictionary

projit.utils.walk_up(bottom)[source]

Function to mimic os.walk, but walk ‘up’ instead of down the directory tree

Parameters:: bottom (String, required) – The path to the bottom of the directory tree.
Returns:: An iterator over strings for all paths
Return type:: Iterator(String)

projit.utils.write_config(config, filename)[source]

Internal utility function for writing config object

Parameters:

config (Dictionary, required) – The config file to save
name (String, required) – The filename to save it to

Returns:

None

Return type:

None

projit.utils.write_properties(pathway, props)[source]

Write properties file to a given path

Parameters:

name (String, required) – The pathway to write to
props (Dictionary(String), required) – The properties object

Returns:

None

Return type:

None

projit package

Submodules

projit.ascii_plot module

projit.cli module

projit.config module

projit.latex_table module

projit.pdf module

projit.projit module

projit.template module

projit.utils module

Module contents