API¶

General¶

finch.environment.package_root = '/home/runner/work/finch/finch/src/finch': The root directory of the finch package.

finch.environment.data_dir = '/home/runner/work/finch/finch/src/finch/data': The directory of the data files.

finch.environment.proj_config = '/home/runner/work/finch/finch/src/finch/data/config/default.ini': The location of the project configuration file.

finch.environment.version_file = '/home/runner/work/finch/finch/src/finch/data/VERSION': The location of the file specifying the version of the finch package.

finch.environment.default_custom_config = 'finch.ini': The default location for a custom configuration file

finch.environment.custom_config_env_var = 'CONFIG': The name of the environment variable specifying the location of a custom configuration file.

finch.environment.node_name_env_var = 'SLURMD_NODENAME': The name of the environment variable holding the name of the current SLURM node.

finch.environment.get_version() → Version: Returns the current version of the finch package.

class finch.environment.WorkerEnvironment: This class manages environments for dask workers.

Data Handling¶

finch.data.DimOrder: Type hint for dimension order. If the dimension order is a string, the dimensions are specified by individual characters (‘x’, ‘y’, etc.). With a list of strings, it is possible to give more descriptive names to individual dimensions.

finch.data.auto_chunk_size : int = -1: The chunk size used for the “auto” keyword.

finch.data.simplify_chunks(...) → Mapping[Hashable, int | tuple[int, ...]]: Simplyfies a chunks dictionary by resolving “auto” and removing None entries.

finch.data.get_chunk_sizes(s: int, d: int) → list[int]: Returns a list of explicit chunk sizes from a single chunk size.

finch.data.chunk_args_equal(c1, ...) → bool: Returns whether two xarray chunk arguments are equal. Auto and None chunk arguments will always be equal. If a dimension name is not present, its size will be interpreted as None.

finch.data.can_rechunk_no_split(c1, ...) → bool: Returns True, if c1 can be rechunked according to c2 without the need to split up any chunks.

finch.data.adjust_dims(dims: Iterable[Hashable], array) → DataArray: Return a new DataArray with the same content as array such that the dimensions match dims in content and order. This is achieved with a combination of expand_dims, squeeze and transform. When trying to remove dimensions with sizes larger than 1, an error will be thrown.

finch.data.get_dim_order_list(order: str | list[str]) → list[str]: Transforms a dimension order into list form.

finch.data.translate_order(order, ...) → str | list[str]: Translates a dimension order from compact form to verbose form or vice versa. A dimension order in compact form is a string where each letter represents a dimension (e.g. “xyz”). A dimension order in verbose form is a list of dimension names (e.g. [“x”, “y”, “generalVerticalLayer”]).

finch.data.load_array_grib(path, ...) → DataArray: Loads a DataArray from a given grib file.

finch.data.load_grib(grib_file, ...) → Dataset: Convenience function for loading multiple ``xarray.DataArray``s from a grib file with load_array_grib() and returning them as a dataset.

Input Management¶

class finch.data.Format(enum.Enum): Supported file formats

class finch.data.Input: Class for managing experiment inputs on disk.

Experiments¶

finch.measure_runtimes(...) → list[finch.experiments.Runtime]: Measures the runtimes of multiple run configurations.

class finch.DaskRunConfig(finch.RunConfig): A run configuration class for running operators on a dask cluster.

class finch.DaskRuntime(finch.Runtime): A class for reporting runtimes of a dask operator.

class finch.OperatorRunConfig(finch.DaskRunConfig): A run configuration class for running operators conforming to the standard operator signature.

class finch.RunConfig(finch.util.Config, abc.ABC): Class for configuring and setting up the environment for experiments.

class finch.Runtime: A class for capturing runtimes of different stages. The runtimes can be cathegorized into serial for serial overheads or parallel for runtimes in parallel regions.

Evaluation¶

finch.evaluation.get_pyplot_grouped_bar_pos(...) → tuple[ndarray, float]: Returns an array of bar positions when trying to create a grouped bar plot for pyplot, along with the width of an individual bar. A row in the returned array contains the bar positions for a label, while a column contains the bar positions for a group.

finch.evaluation.print_version_results(results, ...) → None: Prints the results of an experiment for different input versions.

finch.evaluation.print_results(results, ...) → None: Prints the results of an experiment for different run configurations and input versions.

finch.evaluation.exp_name_attr = 'name': The name of the attribute storing the experiment name in the results dataset.

finch.evaluation.rt_combined_attr = 'rt_combined': The name of the attribute storing the list of combined runtimes in the results dataset.

finch.evaluation.create_result_dataset(results, ...) → Dataset: Constructs a dataset from the results of an experiment. The dimensions are given by the attributes of the Version and RunConfig classes. The coordinates are labels for the version and run config attributes. The array entries in the dataset are the different runtimes which were recorded This result dataset can then be used as an input for different evaluation functions. The result dataset will contain NaN for every combination of version and run config attributes, which is not listed in versions.

finch.evaluation.create_cores_dimension(...) → Dataset: Merges the dimensions in the results array which contribute to the total amount of cores into a single ‘cores’ dimension. The number of cores are calculated by the product of the coordinates of the individual dimensions. The resulting dimension is sorted in increasing core order.

finch.evaluation.rename_labels(results: Dataset, ...) → Dataset: Rename labels for some dimensions. This changes the coordinates in the results dataset

finch.evaluation.remove_labels(results: Dataset, ...) → Dataset: Removes the given labels in the given main dimension from the results array.

finch.evaluation.combine_runtimes(results: Dataset, ...) → Dataset: Combines different runtimes together into a new runtime by adding them up.

finch.evaluation.simple_lin_reg(x, ...) → tuple[ndarray, ndarray]: Performs simple linear regression along the given axis.

finch.evaluation.speedup(runtimes: ndarray, ...) → ndarray: Calculates the speedup for an array of runtimes.

finch.evaluation.find_scaling(scale, ...) → tuple[ndarray, ndarray]: Returns the scaling factor and scaling rate for a series of speedups. This is done via regression on functions of the type $y = lpha * x^eta$. $lpha$ indicates the scaling factor and $eta$ the scaling rate. This assumes that the speedup for scale = 1 is 1.

finch.evaluation.amdahl_speedup(f: ndarray, c: ndarray) → ndarray: Returns the speedups for a serial runtime fractions and a selection of core counts.

finch.evaluation.serial_overhead_analysis(t, ...) → ndarray: Estimates the serial fraction of the total runtime. This is done via the closed-form solution of least squares regression with Amdahl’s law.

finch.evaluation.store_config(results: Dataset) → None: Stores the configuration of the runtime experiment as a yaml. The configuration are the coordinate values of the results array.

Plotting¶

finch.evaluation.get_plots_dir(results) → os.PathLike | str: Returns the path to the directory where plots should be stored for a specific results dataset.

finch.evaluation.plot_style = {'axes.axisbelow': True, 'axes.edgecolor': '#6272a4', 'axes.facecolor': '#282a36', 'axes.grid': True, 'axes.grid.axis': 'y', 'axes.labelcolor': '#6272a4', 'axes.labelpad': 18, 'axes.linewidth': 0.3, 'axes.prop_cycle': cycler('color', ['#8be9fd', '#ffb86c', '#50fa7b', '#ff5555', '#bd93f9', '#ff79c6', '#44475a', '#f1fa8c']), 'axes.spines.bottom': False, 'axes.spines.left': False, 'axes.spines.right': False, 'axes.spines.top': False, 'axes.titlepad': 40, 'axes.titlesize': 14, 'axes.xmargin': 0, 'axes.ymargin': 0, 'boxplot.boxprops.color': '#f8f8f2', 'boxplot.capprops.color': '#f8f8f2', 'boxplot.flierprops.color': '#f8f8f2', 'boxplot.flierprops.markeredgecolor': '#f8f8f2', 'boxplot.whiskerprops.color': '#f8f8f2', 'figure.edgecolor': '#282a36', 'figure.facecolor': '#282a36', 'font.size': 14, 'grid.color': '#6272a4', 'grid.linewidth': 0.3, 'legend.framealpha': 0, 'lines.color': '#f8f8f2', 'patch.edgecolor': '#f8f8f2', 'savefig.edgecolor': '#282a36', 'savefig.facecolor': '#282a36', 'text.color': '#6272a4', 'xtick.color': '#6272a4', 'xtick.major.width': 0.4, 'xtick.minor.bottom': False, 'xtick.minor.top': False, 'ytick.color': '#6272a4', 'ytick.major.width': 0.3, 'ytick.minor.left': False, 'ytick.right': False}: The plot style to use for creating plots.

finch.evaluation.create_plots(...) → None: Creates a series of plots for the results array. The plot creation works as follows. Every plot has multiple different lines, which correspond to the different implementations. The y-axis indicates the value of the result, while the x-axis is a dimension of the result array. For every dimension of size greater than 1, except for the ‘imp’ dimension, a new plot will be created. The other dimensions will then be reduced by flattening and then reducing according to the given reduction function.

finch.evaluation.plot_runtime_parts(results: Dataset, ...) → None: Plots how the full runtimes are split up.

Dask Configuration¶

class finch.scheduler.ClusterConfig(finch.util.Config): A configuration class for configuring a dask SLURM cluster.

finch.scheduler.client : distributed.client.Client | None = None: The currently active dask client.

finch.scheduler.start_slurm_cluster(...) → Client: Starts a new SLURM cluster with the given config and returns a client for it. If a cluster is already running with a different config, it is shut down.

finch.scheduler.start_scheduler(...) → distributed.client.Client | None: Starts a new scheduler either in debug or run mode.

finch.scheduler.clear_memory() → None: Clears the memory of the current scheduler and workers. Attention: This function currently raises a NotImplementedError, because dask currently provides no efficient way of clearning the memory of the scheduler.

finch.scheduler.get_client() → distributed.client.Client | None: Returns the currently registered client.

finch.scheduler.scale_and_wait(n: int) → None: Scales the current registered cluster to n workers and waits for them to start up.

Utility Functions and Classes¶

finch.scheduler.dask_config_get_not_none(key: str, default) → Any: Returns the value of dask.config.get(key, default) and returns the default if None would be returned.

finch.scheduler.parse_slurm_time(t: str) → timedelta: Returns a timedelta from the given duration as is being passed to SLURM

finch.util.check_socket_open(host: str = 'localhost', ...) → bool: Return whether a port is in use / open (True) or not (False).

finch.util.PathLike = os.PathLike | str: Type alias for path types, as recommended py PEP 519.

finch.util.get_absolute(path: os.PathLike | str, ...) → Path: Return the abolute path in the given context if a relative path was given. If an absolute path is given, it is directly returned.

finch.util.get_path(*args: os.PathLike | str) → Path: Returns a new path by joining the given path arguments. If the directories do not exist yet, they will be created.

finch.util.remove_if_exists(path: os.PathLike | str) → Path: Removes the given directory if it exists and returns the original path.

finch.util.clear_dir(path: os.PathLike | str) → None: Removes the content of the given directory.

finch.util.PbarArg = bool | tqdm.std.tqdm: Argument type for handling progress bars. Functions accepting the progress bar argument support outputting their progress via a tqdm progress bar. The argument can then either be a boolean, indicating that a new progress bar should be created, or no progress bar should be used at all, or it can be a preexisting progress bar which will be updated.

finch.util.get_pbar(pbar, ...) → tqdm.std.tqdm | None: Convenience function for handling progress bar arguments. This makes sure that a tqdm progress bar is returned if one is requested, or None.

finch.util.fill_none_properties(x: T, y: T) → T: Return x as a copy, where every attribute which is None is set to the attribute of y.

finch.util.add_missing_properties(x: T, y: object) → T: Return x as a copy, with attributes from y added to x which were not already present.

finch.util.equals_not_none(x: object, y: object) → bool: Compares the common properties of the two given objects. Return True if the not-None properties present in both object are all equal.

finch.util.has_attributes(x: object, y: object, ...) → bool: Return true if y has the same not-None attributes as x.

finch.util.get_class_attribute_names(cls: type, ...) → list[str]: Return the attribute names of a class.

finch.util.get_class_attributes(obj: object) → dict[str, Any]: Return the class attributes of an object as a dictionary.

finch.util.sig_matches_hint(sig: Signature, hint: Any) → bool: Return True if the function signature and the Callable type hint match.

finch.util.list_funcs_matching(...) → list[collections.abc.Callable]: Returns a list of functions from a module matching the given parameters.

class finch.util.Config: Base class for configuration classes. Classes inheriting from this class must be dataclasses (with the @dataclass decorator).

finch.util.flatten_dict(d: dict, separator: str = '_') → dict: Flattens a dictionary. The keys of the inner dictionary are appended to the outer key with the given separator.

finch.util.recursive_update(d: dict, updates: dict) → dict: Returns a copy of d with its content replaced by updates wherever specified. Nested dictionaries won’t be replaced, but updated recursively as specified by updates.

class finch.util.RecursiveNamespace(types.SimpleNamespace): A types.SimpleNamespace which can handle nested dictionaries.

finch.util.is_list_of(val: list[Any], typ) → TypeGuard[list[T]]: Type guard for checking lists.

finch.util.is_2d_list_of(val, ...) → TypeGuard[list[list[T]]]: Type guard for checking lists of lists.

finch.util.is_callable_list(...) → TypeGuard[list[collections.abc.Callable]]: Type guard for checking that a list contains callable objects.

finch.util.random_entity_name(excludes: list[str] = []) → str: Return a random name for an entity, such as a file or a variable.

finch.util.funcs_from_args(f, ...) → list[collections.abc.Callable]: Takes a function f and a list of arguments args and returns a list of functions which are the partial applications of f onto args.

finch.util.ImgSuffix: A literal for image file suffixes.