projit package

Submodules

projit.ascii_plot module

projit.ascii_plot.arange(beg, end, step)[source]

Utility function to emulate arange from earlier python versions

projit.ascii_plot.ascii_plot(ydata, xdata=None, logscale=False, pch='o', xlabel='X', ylabel='Y', width=72, height=50)[source]

Generate an ASCII art plot of a set of data points.

This function was taken from the GitHub gist https://gist.github.com/fransua/6165813 It was modified to work with Python 3, provide neater formatting on the tick labels and fix some problems with extreme values being occasionally ommitted.

Note: Plot title was removed because we will be using it within functions that precede

the calls with their own titles.

Parameters:
  • ydata – list of values to be plotted

  • xdata (None) – x coordinate corresponding to ydata. If None will range between 1 and the length of ydata.

  • logscale (False) – display data with logarithmic Y axis

  • pch ('o') – string for points (whatever + = - * etc…)

  • title ('plot') – string for title of the plot

  • xlabel ('X') – label for the X axis

  • ylabel ('Y') – label for the Y axis

  • width (100) – width in term of characters

  • height (100) – height in term of characters

Returns:

string corresponding to plot

projit.cli module

projit.cli.cli_main()[source]
projit.cli.extract_max_tags_lengths(project, asset, tags)[source]

CLI Internal Function: determine the maximum length of the content inside a specific set of tags on an asset in the project.

Parameters:
  • project (Projit, required) – The projit project object

  • asset (String, required) – The asset type

  • tags (list(String), required) – The tags to search for

Returns:

List of tag lengths

Return type:

list(Int)

projit.cli.filler(current, max_len, content=' ')[source]

Internal function to fill a string with spaces to max_len

Parameters:
  • current (Int, required) – The length of the current content

  • max_len (Int, required) – The maximum string length

  • content (Char, optional) – The character to fill with (default ‘ ‘)

Returns:

filled_content

Return type:

String

projit.cli.main()[source]
projit.cli.print_header(header)[source]
projit.cli.print_results_latex(title, df)[source]

Latex output - Putting this in a central function in case we change the functionality or format in the future.

Parameters:
  • title (String, required) – The table title

  • df (DataFrame, required) – The dataframe to print out

Returns:

None

Return type:

None

projit.cli.print_results_markdown(title, df)[source]
projit.cli.print_usage(prog)[source]

Command line application usage instrutions.

projit.cli.task_add(project, asset, name, path)[source]

Add elements to a project from the command line

projit.cli.task_compare(project, datasets, metric, format, precision)[source]

CLI Internal Task Function: Compare results across muliple datasets.

This command loads the results for each dataset and extracts just the records for the specified metric to compile the comparison dataset to display.

Parameters:
  • project (Projit, required) – The projit project object

  • datasets (list(String), required) – The list of datasets to compare

  • metric (String, required) – The metric to use for comparison

  • format (String, required) – The output format (markdown|latex|default)

  • precision (Int, required) – The precision for results in the table

Returns:

None

Return type:

None

projit.cli.task_init(name, template='')[source]

CLI Internal Task Function: Initialise a project from the command line. This function will initate a project with a blank description. Users will need to update this in subsequent interation.

Parameters:
  • name (String, required) – The name of the project

  • template (String, optional) – The name of the template to use when initialising

Returns:

None

Return type:

None

projit.cli.task_list(subcmd, project, dataset, format, precision, tags)[source]

CLI Internal Task Function: List content of a project from the command line

projit.cli.task_plot(project, experiment, property, metric)[source]
projit.cli.task_render(project, path)[source]

Generates a pdf and writes it to the provided path

Parameters:
  • project (Projit, required) – The projit project object

  • path (String, required) – The rendering path

projit.cli.task_rm(project, asset, name)[source]

Remove elements to a project from the command line

projit.cli.task_status(project)[source]

CLI Internal Task Function: Print the project properties to the command line

Parameters:

project (Projit, required) – The projit project object

Returns:

None

Return type:

None

projit.cli.task_tag(project, asset, name, values)[source]

Add tags to an asset in the project from the command line

projit.cli.task_update(project)[source]

CLI Internal Task Function: Update a project from the command line

This function invokes an interaction via the terminal to update the project properties.

Returns:

None

Return type:

None

projit.config module

projit.latex_table module

Support function for generating a latex table from a pandas dataframe This function negates the need for additional dependencies

projit.latex_table.clean_data_for_latex(input)[source]
This utility function is required because some strings might contain LaTeX special

characters, and therefor need to be escaped before latex rendering will function.

projit.latex_table.print_latex(df, title)[source]

projit.pdf module

class projit.pdf.PDF(orientation='P', unit='mm', format='A4')[source]

Bases: FPDF

add_description(description)[source]
add_title(title)[source]
setup()[source]

projit.projit module

class projit.projit.Projit(path, name, desc='', experiments=[], datasets={}, results={}, params={}, hyperparams={}, dataresults={}, executions={}, tags={})[source]

Bases: object

Projit Class. This is a data structure to contain the core elements of a data science project. It will permit loose coupling between processes and experiments but provide a simple overarching structure for communication and documentation.

add_dataset(name, path)[source]

Add a named dataset to the project.

Parameters:
  • name (string, required) – The dataset name

  • path (string, required) – The path to the data set (either local path, URL or S3 Bucket)

Returns:

None

Return type:

None

add_experiment(name, path)[source]

Add information of a new experiment to the project. Then save the project configuration. This function will overwrite an experiment of the same name and delete any previous results.

Parameters:
  • name (string, required) – The experiment name

  • path (string, required) – The path to the experiment.

Returns:

None

Return type:

None

add_hyperparam(name, value)[source]

Add a set of hyper parameters to the project.

Parameters:
  • name (string, required) – The experiment name

  • value (Dictionary) – The Dictionary of hyperparameters

Returns:

None

Return type:

None

add_param(name, value)[source]

Add a parameter to the project.

Parameters:
  • name (string, required) – The parameter name

  • value (Any) – The value taken by that parameter

Returns:

None

Return type:

None

add_result(experiment, metric, value, dataset=None)[source]

Add results from an experiment to the project.

They can be overall project results, or associated with a specific dataset

Parameters:
  • name (string, required) – The experiment name

  • metric (string, required) – The name of the metric we are adding.

  • value (float, required) – The value of the metric to add.

  • dataset (string, optional) – The dataset against which the results are generated

Returns:

None

Return type:

None

add_tags(asset, name, tags)[source]

Add tags to a specific asset

Parameters:
  • asset (string, required) – The asset type (experiment|dataset)

  • name (string, required) – The asset name

  • tags (Dictionary(string:string)) – The distionary of tags

Returns:

None

Return type:

None

clean_experimental_results(name)[source]

Remove all results for a given experiment

Parameters:

name (string, required) – The experiment name

Returns:

None

Return type:

None

create_local_path(ds)[source]

Create and return a path to a dataset. Internal use.

Returns:

Path to dataset

Return type:

String

dataset_exists(name)[source]

Check if a given dataset is in the data structure

Parameters:

name (string, required) – The dataset name

Returns:

exists

Return type:

Boolean

end_experiment(name, id, hyperparams={})[source]

End an experiment execution. This function require both the experiment name and the hash ID of the previously started execution

Parameters:
  • name (string, required) – The experiment name (Unique Identifer)

  • id (string, required) – The execution hash ID returned by the function: start_experiment

  • hyperparams – Optional dictionary of hyperparameters used in the experiment execution

Returns:

None

Return type:

None

experiment_exists(name)[source]

Check if a given experiment is in the data structure

Parameters:

name (string, required) – The experiment name

Returns:

exists

Return type:

Boolean

get_dataset(name)[source]

Retrieve the dataset by name.

Parameters:

name (string, required) – The dataset to retrieve

Returns:

Path to dataset

Return type:

String

get_execution_times(name)[source]

Given an experiment name Return an list of all execution times

Parameters:

name (string, required) – The experiment name (Unique Identifer)

Returns:

execution_times : Array of execution times

Return type:

list(float)

get_experiment_execution_stats(name)[source]

Given an experiment name Return the execution statistics

Parameters:

name (string, required) – The experiment name (Unique Identifer)

Returns:

executions, mean_execution_time : A pair of statistics

Return type:

int, float

get_hyperparam(name)[source]
get_mean_execution_time(name)[source]

Given an experiment name Return the mean execution time

Parameters:

name (string, required) – The experiment name (Unique Identifer)

Returns:

mean_execution_time : The mean time of execution

Return type:

float

get_param(name)[source]
get_path_to_dataset(name)[source]
get_results(dataset=None)[source]

Retrieve the experimental results as a DataFrame.

They can be overall project results, or associated with a specific dataset

Parameters:

dataset (string, optional) – The dataset against which the results are generated

Returns:

DataFrame of results

Return type:

pandas.DataFrame

get_root_path()[source]

Get the path to where the project folder is located

Returns:

path : The Path to the Project folder

Return type:

String

get_tags(asset, name, tags)[source]

Retrive specified tags to a specific asset Returns the list of tag values in the same order as requested.

Parameters:
  • asset (string, required) – The asset type (experiment|dataset)

  • name (string, required) – The asset name

  • tags (list(string)) – The list of tags

Returns:

tags

Return type:

list(string)

initiate_lock()[source]

Lock files are used during processes that modify the project so that we get consistent state across parallel executions.

Returns:

None

Return type:

None

is_complete_path(path)[source]
release_lock()[source]

Lock files are used during processes that modify the project so that we get consistent state across parallel executions. Release the lock by deleting the lock file

Returns:

None

Return type:

None

reload()[source]

Reload the project meta-data from disk. - Necessary when multiple processes are running experiments in the same project and we want to avoid overwriting data.

Returns:

None

Return type:

None

render(path)[source]

Render the project data into a PDF file

Parameters:

path (string, required) – The path to write the PF to

Returns:

None

Return type:

None

rm_dataset(name)[source]

Remove a named dataset to the project.

Parameters:

name – The dataset name (or ‘.’ for all datasets)

Returns:

None

Return type:

None

rm_experiment(name)[source]

Remove a named experiment from the project.

Parameters:

name – The experiment name (or ‘.’ for all experiments)

Returns:

None

Return type:

None

save()[source]

Save your projit project into config files within the projit config dir

Returns:

None

Return type:

None

start_experiment(name, path, params={}, tags={})[source]

Start an experiment execution. This function will create a new experiment if this is the first execution otherwise it will simply add a new execution record. Function returns an unique identifer for the execution: required to end the execution in a call to projit.Projit.end_experiment()

Parameters:
  • name (string, required) – The experiment name (Unique Identifer)

  • path (string, required) – The path to the experiment script being executed

  • params (Dictionary, optional) – Optional dictionary of parameters used in the experiment execution

  • tags (Dictionary, optional) – Optional dictionary of tags to describe the experiment

Returns:

id : The Execution ID

Return type:

String

update_name_description(name, descrip)[source]

Update the core values name and description

Parameters:
  • name (string, required) – The project name

  • descrip (string, required) – The project description

Returns:

None

Return type:

None

validate_asset(asset, name)[source]

Check if a given asset exists

Parameters:
  • asset (string, required) – The asset type (experiment|dataset)

  • name (string, required) – The asset name

Returns:

exists

Return type:

Boolean

projit.projit.init(template, name, desc='')[source]

Initialise a new projit project. Create the config directory and write the project config there.

Parameters:
  • name (string, required) – The name of the project

  • desc (string, required) – The project description

Returns:

Projit Object

Return type:

Projit

projit.projit.init_template(template)[source]

Initialise a project from a specified template

projit.projit.load(config_path)[source]

This function allows you to instantiate a Projit project from an existing config_path The config path must contain the required config file that contains the required fields.

Note: This function will always overwrite the path variable in the object so the instance is aware of where it is relative to the config directory.

Parameters:

config_path (string, required) – The path to the projit configuration

Returns:

Projit Object

Return type:

Projit

projit.projit.projit_load()[source]
Load the project by first locating the config file and using it to

initialise the projit Project class.

Returns:

Projit Object

Return type:

Projit

projit.template module

projit.template.end_profile(proc_name)[source]

End the profiling of a named process

Returns:

None

Return type:

None

projit.template.eprint(*args, **kwargs)[source]

Utility internal function for easy printing of messages to STDERR

Parameters:
  • args (list(string), required) – List of strings to print

  • kwargs (dictionary(String:String), required) – Keyword arguments for print function

Returns:

None

Return type:

None

projit.template.initialise_profile()[source]

Initialise the profiles

Returns:

None

Return type:

None

projit.template.load_template(filename)[source]

Utility function to load a project template from a file

projit.template.padded(k, padto=20)[source]

Internal utility function to pad a string

Parameters:
  • k (String, required) – The String of characters to pad out

  • padto (Int, optional) – The number of characters to pad out to

Returns:

padded_string

Return type:

String

projit.template.print_profiles()[source]

Print the result of the profiling of processes

Returns:

None

Return type:

None

projit.template.start_profile(proc_name)[source]

Start the profile of named process

Returns:

None

Return type:

None

projit.utils module

projit.utils.create_properties(project_name, descrip)[source]

Create an initial properties Dictionary for project config

Parameters:
  • project_name (String, required) – The project name

  • descrip (String, required) – The description of the project

Returns:

The project config object

Return type:

Dictionary(String:String)

projit.utils.get_data_config(pathway)[source]
Internal utility function for getting path to meta-data file

contain datasets.

Returns:

Path

Return type:

String

projit.utils.get_experiments(pathway)[source]
Internal utility function for getting path to meta-data file

containing experiments and exections.

Returns:

Path

Return type:

String

projit.utils.get_properties(pathway)[source]

Get the properties file

Parameters:

pathway (String, required) – Path to the file location name

Returns:

The project config object

Return type:

Dictionary(String:String)

projit.utils.initialise_project(name, descrip)[source]

Intialise the project

Parameters:
  • name (String, required) – The project name

  • descrip (String, required) – The description of the project

Returns:

None

Return type:

None

projit.utils.locate_projit_config()[source]

Find a path to a projit project config, or return empty string. Required so that commands run against a project can quickly locate the configuration.

Returns:

path : The Path to the projit Project folder

Return type:

String

projit.utils.open_config(filename)[source]

Internal utility function for getting config object

Parameters:

name (String, required) – The filename to open

Returns:

config

Return type:

Dictionary

projit.utils.walk_up(bottom)[source]

Function to mimic os.walk, but walk ‘up’ instead of down the directory tree

Parameters:

bottom (String, required) – The path to the bottom of the directory tree.

Returns:

An iterator over strings for all paths

Return type:

Iterator(String)

projit.utils.write_config(config, filename)[source]

Internal utility function for writing config object

Parameters:
  • config (Dictionary, required) – The config file to save

  • name (String, required) – The filename to save it to

Returns:

None

Return type:

None

projit.utils.write_properties(pathway, props)[source]

Write properties file to a given path

Parameters:
  • name (String, required) – The pathway to write to

  • props (Dictionary(String), required) – The properties object

Returns:

None

Return type:

None

Module contents