API

General

finch.environment.package_root = '/home/runner/work/finch/finch/src/finch'

The root directory of the finch package.

finch.environment.data_dir = '/home/runner/work/finch/finch/src/finch/data'

The directory of the data files.

finch.environment.proj_config = '/home/runner/work/finch/finch/src/finch/data/config/default.ini'

The location of the project configuration file.

finch.environment.version_file = '/home/runner/work/finch/finch/src/finch/data/VERSION'

The location of the file specifying the version of the finch package.

finch.environment.default_custom_config = 'finch.ini'

The default location for a custom configuration file

finch.environment.custom_config_env_var = 'CONFIG'

The name of the environment variable specifying the location of a custom configuration file.

finch.environment.node_name_env_var = 'SLURMD_NODENAME'

The name of the environment variable holding the name of the current SLURM node.

finch.environment.get_version() Version

Returns the current version of the finch package.

class finch.environment.WorkerEnvironment

This class manages environments for dask workers.

Data Handling

finch.data.DimOrder

Type hint for dimension order. If the dimension order is a string, the dimensions are specified by individual characters (‘x’, ‘y’, etc.). With a list of strings, it is possible to give more descriptive names to individual dimensions.

finch.data.auto_chunk_size : int = -1

The chunk size used for the “auto” keyword.

finch.data.simplify_chunks(...) Mapping[Hashable, int | tuple[int, ...]]

Simplyfies a chunks dictionary by resolving “auto” and removing None entries.

finch.data.get_chunk_sizes(s: int, d: int) list[int]

Returns a list of explicit chunk sizes from a single chunk size.

finch.data.chunk_args_equal(c1, ...) bool

Returns whether two xarray chunk arguments are equal. Auto and None chunk arguments will always be equal. If a dimension name is not present, its size will be interpreted as None.

finch.data.can_rechunk_no_split(c1, ...) bool

Returns True, if c1 can be rechunked according to c2 without the need to split up any chunks.

finch.data.adjust_dims(dims: Iterable[Hashable], array) DataArray

Return a new DataArray with the same content as array such that the dimensions match dims in content and order. This is achieved with a combination of expand_dims, squeeze and transform. When trying to remove dimensions with sizes larger than 1, an error will be thrown.

finch.data.get_dim_order_list(order: str | list[str]) list[str]

Transforms a dimension order into list form.

finch.data.translate_order(order, ...) str | list[str]

Translates a dimension order from compact form to verbose form or vice versa. A dimension order in compact form is a string where each letter represents a dimension (e.g. “xyz”). A dimension order in verbose form is a list of dimension names (e.g. [“x”, “y”, “generalVerticalLayer”]).

finch.data.load_array_grib(path, ...) DataArray

Loads a DataArray from a given grib file.

finch.data.load_grib(grib_file, ...) Dataset

Convenience function for loading multiple ``xarray.DataArray``s from a grib file with load_array_grib() and returning them as a dataset.

Input Management

class finch.data.Format(enum.Enum)

Supported file formats

class finch.data.Input

Class for managing experiment inputs on disk.

Experiments

finch.measure_runtimes(...) list[finch.experiments.Runtime]

Measures the runtimes of multiple run configurations.

class finch.DaskRunConfig(finch.RunConfig)

A run configuration class for running operators on a dask cluster.

class finch.DaskRuntime(finch.Runtime)

A class for reporting runtimes of a dask operator.

class finch.OperatorRunConfig(finch.DaskRunConfig)

A run configuration class for running operators conforming to the standard operator signature.

class finch.RunConfig(finch.util.Config, abc.ABC)

Class for configuring and setting up the environment for experiments.

class finch.Runtime

A class for capturing runtimes of different stages. The runtimes can be cathegorized into serial for serial overheads or parallel for runtimes in parallel regions.

Evaluation

finch.evaluation.get_pyplot_grouped_bar_pos(...) tuple[ndarray, float]

Returns an array of bar positions when trying to create a grouped bar plot for pyplot, along with the width of an individual bar. A row in the returned array contains the bar positions for a label, while a column contains the bar positions for a group.

finch.evaluation.print_version_results(results, ...) None

Prints the results of an experiment for different input versions.

finch.evaluation.print_results(results, ...) None

Prints the results of an experiment for different run configurations and input versions.

finch.evaluation.exp_name_attr = 'name'

The name of the attribute storing the experiment name in the results dataset.

finch.evaluation.rt_combined_attr = 'rt_combined'

The name of the attribute storing the list of combined runtimes in the results dataset.

finch.evaluation.create_result_dataset(results, ...) Dataset

Constructs a dataset from the results of an experiment. The dimensions are given by the attributes of the Version and RunConfig classes. The coordinates are labels for the version and run config attributes. The array entries in the dataset are the different runtimes which were recorded This result dataset can then be used as an input for different evaluation functions. The result dataset will contain NaN for every combination of version and run config attributes, which is not listed in versions.

finch.evaluation.create_cores_dimension(...) Dataset

Merges the dimensions in the results array which contribute to the total amount of cores into a single ‘cores’ dimension. The number of cores are calculated by the product of the coordinates of the individual dimensions. The resulting dimension is sorted in increasing core order.

finch.evaluation.rename_labels(results: Dataset, ...) Dataset

Rename labels for some dimensions. This changes the coordinates in the results dataset

finch.evaluation.remove_labels(results: Dataset, ...) Dataset

Removes the given labels in the given main dimension from the results array.

finch.evaluation.combine_runtimes(results: Dataset, ...) Dataset

Combines different runtimes together into a new runtime by adding them up.

finch.evaluation.simple_lin_reg(x, ...) tuple[ndarray, ndarray]

Performs simple linear regression along the given axis.

finch.evaluation.speedup(runtimes: ndarray, ...) ndarray

Calculates the speedup for an array of runtimes.

finch.evaluation.find_scaling(scale, ...) tuple[ndarray, ndarray]

Returns the scaling factor and scaling rate for a series of speedups. This is done via regression on functions of the type $y = lpha * x^eta$. $lpha$ indicates the scaling factor and $eta$ the scaling rate. This assumes that the speedup for scale = 1 is 1.

finch.evaluation.amdahl_speedup(f: ndarray, c: ndarray) ndarray

Returns the speedups for a serial runtime fractions and a selection of core counts.

finch.evaluation.serial_overhead_analysis(t, ...) ndarray

Estimates the serial fraction of the total runtime. This is done via the closed-form solution of least squares regression with Amdahl’s law.

finch.evaluation.store_config(results: Dataset) None

Stores the configuration of the runtime experiment as a yaml. The configuration are the coordinate values of the results array.

Plotting

finch.evaluation.get_plots_dir(results) os.PathLike | str

Returns the path to the directory where plots should be stored for a specific results dataset.

finch.evaluation.plot_style = {'axes.axisbelow': True, 'axes.edgecolor': '#6272a4', 'axes.facecolor': '#282a36', 'axes.grid': True, 'axes.grid.axis': 'y', 'axes.labelcolor': '#6272a4', 'axes.labelpad': 18, 'axes.linewidth': 0.3, 'axes.prop_cycle': cycler('color', ['#8be9fd', '#ffb86c', '#50fa7b', '#ff5555', '#bd93f9', '#ff79c6', '#44475a', '#f1fa8c']), 'axes.spines.bottom': False, 'axes.spines.left': False, 'axes.spines.right': False, 'axes.spines.top': False, 'axes.titlepad': 40, 'axes.titlesize': 14, 'axes.xmargin': 0, 'axes.ymargin': 0, 'boxplot.boxprops.color': '#f8f8f2', 'boxplot.capprops.color': '#f8f8f2', 'boxplot.flierprops.color': '#f8f8f2', 'boxplot.flierprops.markeredgecolor': '#f8f8f2', 'boxplot.whiskerprops.color': '#f8f8f2', 'figure.edgecolor': '#282a36', 'figure.facecolor': '#282a36', 'font.size': 14, 'grid.color': '#6272a4', 'grid.linewidth': 0.3, 'legend.framealpha': 0, 'lines.color': '#f8f8f2', 'patch.edgecolor': '#f8f8f2', 'savefig.edgecolor': '#282a36', 'savefig.facecolor': '#282a36', 'text.color': '#6272a4', 'xtick.color': '#6272a4', 'xtick.major.width': 0.4, 'xtick.minor.bottom': False, 'xtick.minor.top': False, 'ytick.color': '#6272a4', 'ytick.major.width': 0.3, 'ytick.minor.left': False, 'ytick.right': False}

The plot style to use for creating plots.

finch.evaluation.create_plots(...) None

Creates a series of plots for the results array. The plot creation works as follows. Every plot has multiple different lines, which correspond to the different implementations. The y-axis indicates the value of the result, while the x-axis is a dimension of the result array. For every dimension of size greater than 1, except for the ‘imp’ dimension, a new plot will be created. The other dimensions will then be reduced by flattening and then reducing according to the given reduction function.

finch.evaluation.plot_runtime_parts(results: Dataset, ...) None

Plots how the full runtimes are split up.

Dask Configuration

class finch.scheduler.ClusterConfig(finch.util.Config)

A configuration class for configuring a dask SLURM cluster.

finch.scheduler.client : distributed.client.Client | None = None

The currently active dask client.

finch.scheduler.start_slurm_cluster(...) Client

Starts a new SLURM cluster with the given config and returns a client for it. If a cluster is already running with a different config, it is shut down.

finch.scheduler.start_scheduler(...) distributed.client.Client | None

Starts a new scheduler either in debug or run mode.

finch.scheduler.clear_memory() None

Clears the memory of the current scheduler and workers. Attention: This function currently raises a NotImplementedError, because dask currently provides no efficient way of clearning the memory of the scheduler.

finch.scheduler.get_client() distributed.client.Client | None

Returns the currently registered client.

finch.scheduler.scale_and_wait(n: int) None

Scales the current registered cluster to n workers and waits for them to start up.

Utility Functions and Classes

finch.scheduler.dask_config_get_not_none(key: str, default) Any

Returns the value of dask.config.get(key, default) and returns the default if None would be returned.

finch.scheduler.parse_slurm_time(t: str) timedelta

Returns a timedelta from the given duration as is being passed to SLURM

finch.util.check_socket_open(host: str = 'localhost', ...) bool

Return whether a port is in use / open (True) or not (False).

finch.util.PathLike = os.PathLike | str

Type alias for path types, as recommended py PEP 519.

finch.util.get_absolute(path: os.PathLike | str, ...) Path

Return the abolute path in the given context if a relative path was given. If an absolute path is given, it is directly returned.

finch.util.get_path(*args: os.PathLike | str) Path

Returns a new path by joining the given path arguments. If the directories do not exist yet, they will be created.

finch.util.remove_if_exists(path: os.PathLike | str) Path

Removes the given directory if it exists and returns the original path.

finch.util.clear_dir(path: os.PathLike | str) None

Removes the content of the given directory.

finch.util.PbarArg = bool | tqdm.std.tqdm

Argument type for handling progress bars. Functions accepting the progress bar argument support outputting their progress via a tqdm progress bar. The argument can then either be a boolean, indicating that a new progress bar should be created, or no progress bar should be used at all, or it can be a preexisting progress bar which will be updated.

finch.util.get_pbar(pbar, ...) tqdm.std.tqdm | None

Convenience function for handling progress bar arguments. This makes sure that a tqdm progress bar is returned if one is requested, or None.

finch.util.fill_none_properties(x: T, y: T) T

Return x as a copy, where every attribute which is None is set to the attribute of y.

finch.util.add_missing_properties(x: T, y: object) T

Return x as a copy, with attributes from y added to x which were not already present.

finch.util.equals_not_none(x: object, y: object) bool

Compares the common properties of the two given objects. Return True if the not-None properties present in both object are all equal.

finch.util.has_attributes(x: object, y: object, ...) bool

Return true if y has the same not-None attributes as x.

finch.util.get_class_attribute_names(cls: type, ...) list[str]

Return the attribute names of a class.

finch.util.get_class_attributes(obj: object) dict[str, Any]

Return the class attributes of an object as a dictionary.

finch.util.sig_matches_hint(sig: Signature, hint: Any) bool

Return True if the function signature and the Callable type hint match.

finch.util.list_funcs_matching(...) list[collections.abc.Callable]

Returns a list of functions from a module matching the given parameters.

class finch.util.Config

Base class for configuration classes. Classes inheriting from this class must be dataclasses (with the @dataclass decorator).

finch.util.flatten_dict(d: dict, separator: str = '_') dict

Flattens a dictionary. The keys of the inner dictionary are appended to the outer key with the given separator.

finch.util.recursive_update(d: dict, updates: dict) dict

Returns a copy of d with its content replaced by updates wherever specified. Nested dictionaries won’t be replaced, but updated recursively as specified by updates.

class finch.util.RecursiveNamespace(types.SimpleNamespace)

A types.SimpleNamespace which can handle nested dictionaries.

finch.util.is_list_of(val: list[Any], typ) TypeGuard[list[T]]

Type guard for checking lists.

finch.util.is_2d_list_of(val, ...) TypeGuard[list[list[T]]]

Type guard for checking lists of lists.

finch.util.is_callable_list(...) TypeGuard[list[collections.abc.Callable]]

Type guard for checking that a list contains callable objects.

finch.util.random_entity_name(excludes: list[str] = []) str

Return a random name for an entity, such as a file or a variable.

finch.util.funcs_from_args(f, ...) list[collections.abc.Callable]

Takes a function f and a list of arguments args and returns a list of functions which are the partial applications of f onto args.

finch.util.ImgSuffix

A literal for image file suffixes.