API¶
General¶
-
finch.environment.package_root =
'/home/runner/work/finch/finch/src/finch'
The root directory of the finch package.
-
finch.environment.data_dir =
'/home/runner/work/finch/finch/src/finch/data'
The directory of the data files.
-
finch.environment.proj_config =
'/home/runner/work/finch/finch/src/finch/data/config/default.ini'
The location of the project configuration file.
-
finch.environment.version_file =
'/home/runner/work/finch/finch/src/finch/data/VERSION'
The location of the file specifying the version of the
finch
package.
-
finch.environment.default_custom_config =
'finch.ini'
The default location for a custom configuration file
-
finch.environment.custom_config_env_var =
'CONFIG'
The name of the environment variable specifying the location of a custom configuration file.
-
finch.environment.node_name_env_var =
'SLURMD_NODENAME'
The name of the environment variable holding the name of the current SLURM node.
- finch.environment.get_version() Version
Returns the current version of the
finch
package.
- class finch.environment.WorkerEnvironment
This class manages environments for dask workers.
Data Handling¶
- finch.data.DimOrder
Type hint for dimension order. If the dimension order is a string, the dimensions are specified by individual characters (‘x’, ‘y’, etc.). With a list of strings, it is possible to give more descriptive names to individual dimensions.
-
finch.data.auto_chunk_size : int =
-1
The chunk size used for the “auto” keyword.
- finch.data.simplify_chunks(...) Mapping[Hashable, int | tuple[int, ...]]
Simplyfies a chunks dictionary by resolving “auto” and removing None entries.
- finch.data.get_chunk_sizes(s: int, d: int) list[int]
Returns a list of explicit chunk sizes from a single chunk size.
- finch.data.chunk_args_equal(c1, ...) bool
Returns whether two xarray chunk arguments are equal. Auto and None chunk arguments will always be equal. If a dimension name is not present, its size will be interpreted as
None
.
- finch.data.can_rechunk_no_split(c1, ...) bool
Returns True, if
c1
can be rechunked according toc2
without the need to split up any chunks.
- finch.data.adjust_dims(dims: Iterable[Hashable], array) DataArray
Return a new DataArray with the same content as
array
such that the dimensions matchdims
in content and order. This is achieved with a combination ofexpand_dims
,squeeze
andtransform
. When trying to remove dimensions with sizes larger than 1, an error will be thrown.
- finch.data.get_dim_order_list(order: str | list[str]) list[str]
Transforms a dimension order into list form.
- finch.data.translate_order(order, ...) str | list[str]
Translates a dimension order from compact form to verbose form or vice versa. A dimension order in compact form is a string where each letter represents a dimension (e.g. “xyz”). A dimension order in verbose form is a list of dimension names (e.g. [“x”, “y”, “generalVerticalLayer”]).
- finch.data.load_array_grib(path, ...) DataArray
Loads a DataArray from a given grib file.
- finch.data.load_grib(grib_file, ...) Dataset
Convenience function for loading multiple ``xarray.DataArray``s from a grib file with
load_array_grib()
and returning them as a dataset.
Input Management¶
- class finch.data.Format(enum.Enum)
Supported file formats
- class finch.data.Input
Class for managing experiment inputs on disk.
Experiments¶
- finch.measure_runtimes(...) list[finch.experiments.Runtime]
Measures the runtimes of multiple run configurations.
- class finch.DaskRunConfig(finch.RunConfig)
A run configuration class for running operators on a dask cluster.
- class finch.DaskRuntime(finch.Runtime)
A class for reporting runtimes of a dask operator.
- class finch.OperatorRunConfig(finch.DaskRunConfig)
A run configuration class for running operators conforming to the standard operator signature.
- class finch.RunConfig(finch.util.Config, abc.ABC)
Class for configuring and setting up the environment for experiments.
- class finch.Runtime
A class for capturing runtimes of different stages. The runtimes can be cathegorized into serial for serial overheads or parallel for runtimes in parallel regions.
Evaluation¶
- finch.evaluation.get_pyplot_grouped_bar_pos(...) tuple[ndarray, float]
Returns an array of bar positions when trying to create a grouped bar plot for pyplot, along with the width of an individual bar. A row in the returned array contains the bar positions for a label, while a column contains the bar positions for a group.
- finch.evaluation.print_version_results(results, ...) None
Prints the results of an experiment for different input versions.
- finch.evaluation.print_results(results, ...) None
Prints the results of an experiment for different run configurations and input versions.
-
finch.evaluation.exp_name_attr =
'name'
The name of the attribute storing the experiment name in the results dataset.
-
finch.evaluation.rt_combined_attr =
'rt_combined'
The name of the attribute storing the list of combined runtimes in the results dataset.
- finch.evaluation.create_result_dataset(results, ...) Dataset
Constructs a dataset from the results of an experiment. The dimensions are given by the attributes of the Version and RunConfig classes. The coordinates are labels for the version and run config attributes. The array entries in the dataset are the different runtimes which were recorded This result dataset can then be used as an input for different evaluation functions. The result dataset will contain NaN for every combination of version and run config attributes, which is not listed in
versions
.
- finch.evaluation.create_cores_dimension(...) Dataset
Merges the dimensions in the results array which contribute to the total amount of cores into a single ‘cores’ dimension. The number of cores are calculated by the product of the coordinates of the individual dimensions. The resulting dimension is sorted in increasing core order.
- finch.evaluation.rename_labels(results: Dataset, ...) Dataset
Rename labels for some dimensions. This changes the coordinates in the results dataset
- finch.evaluation.remove_labels(results: Dataset, ...) Dataset
Removes the given labels in the given main dimension from the results array.
- finch.evaluation.combine_runtimes(results: Dataset, ...) Dataset
Combines different runtimes together into a new runtime by adding them up.
- finch.evaluation.simple_lin_reg(x, ...) tuple[ndarray, ndarray]
Performs simple linear regression along the given axis.
- finch.evaluation.speedup(runtimes: ndarray, ...) ndarray
Calculates the speedup for an array of runtimes.
- finch.evaluation.find_scaling(scale, ...) tuple[ndarray, ndarray]
Returns the scaling factor and scaling rate for a series of speedups. This is done via regression on functions of the type $y = lpha * x^eta$. $lpha$ indicates the scaling factor and $eta$ the scaling rate. This assumes that the speedup for scale = 1 is 1.
- finch.evaluation.amdahl_speedup(f: ndarray, c: ndarray) ndarray
Returns the speedups for a serial runtime fractions and a selection of core counts.
- finch.evaluation.serial_overhead_analysis(t, ...) ndarray
Estimates the serial fraction of the total runtime. This is done via the closed-form solution of least squares regression with Amdahl’s law.
- finch.evaluation.store_config(results: Dataset) None
Stores the configuration of the runtime experiment as a yaml. The configuration are the coordinate values of the results array.
Plotting¶
- finch.evaluation.get_plots_dir(results) os.PathLike | str
Returns the path to the directory where plots should be stored for a specific results dataset.
-
finch.evaluation.plot_style =
{'axes.axisbelow': True, 'axes.edgecolor': '#6272a4', 'axes.facecolor': '#282a36', 'axes.grid': True, 'axes.grid.axis': 'y', 'axes.labelcolor': '#6272a4', 'axes.labelpad': 18, 'axes.linewidth': 0.3, 'axes.prop_cycle': cycler('color', ['#8be9fd', '#ffb86c', '#50fa7b', '#ff5555', '#bd93f9', '#ff79c6', '#44475a', '#f1fa8c']), 'axes.spines.bottom': False, 'axes.spines.left': False, 'axes.spines.right': False, 'axes.spines.top': False, 'axes.titlepad': 40, 'axes.titlesize': 14, 'axes.xmargin': 0, 'axes.ymargin': 0, 'boxplot.boxprops.color': '#f8f8f2', 'boxplot.capprops.color': '#f8f8f2', 'boxplot.flierprops.color': '#f8f8f2', 'boxplot.flierprops.markeredgecolor': '#f8f8f2', 'boxplot.whiskerprops.color': '#f8f8f2', 'figure.edgecolor': '#282a36', 'figure.facecolor': '#282a36', 'font.size': 14, 'grid.color': '#6272a4', 'grid.linewidth': 0.3, 'legend.framealpha': 0, 'lines.color': '#f8f8f2', 'patch.edgecolor': '#f8f8f2', 'savefig.edgecolor': '#282a36', 'savefig.facecolor': '#282a36', 'text.color': '#6272a4', 'xtick.color': '#6272a4', 'xtick.major.width': 0.4, 'xtick.minor.bottom': False, 'xtick.minor.top': False, 'ytick.color': '#6272a4', 'ytick.major.width': 0.3, 'ytick.minor.left': False, 'ytick.right': False}
The plot style to use for creating plots.
- finch.evaluation.create_plots(...) None
Creates a series of plots for the results array. The plot creation works as follows. Every plot has multiple different lines, which correspond to the different implementations. The y-axis indicates the value of the result, while the x-axis is a dimension of the result array. For every dimension of size greater than 1, except for the ‘imp’ dimension, a new plot will be created. The other dimensions will then be reduced by flattening and then reducing according to the given reduction function.
- finch.evaluation.plot_runtime_parts(results: Dataset, ...) None
Plots how the full runtimes are split up.
Dask Configuration¶
- class finch.scheduler.ClusterConfig(finch.util.Config)
A configuration class for configuring a dask SLURM cluster.
-
finch.scheduler.client : distributed.client.Client | None =
None
The currently active dask client.
- finch.scheduler.start_slurm_cluster(...) Client
Starts a new SLURM cluster with the given config and returns a client for it. If a cluster is already running with a different config, it is shut down.
- finch.scheduler.start_scheduler(...) distributed.client.Client | None
Starts a new scheduler either in debug or run mode.
- finch.scheduler.clear_memory() None
Clears the memory of the current scheduler and workers. Attention: This function currently raises a
NotImplementedError
, because dask currently provides no efficient way of clearning the memory of the scheduler.
- finch.scheduler.get_client() distributed.client.Client | None
Returns the currently registered client.
- finch.scheduler.scale_and_wait(n: int) None
Scales the current registered cluster to
n
workers and waits for them to start up.
Utility Functions and Classes¶
- finch.scheduler.dask_config_get_not_none(key: str, default) Any
Returns the value of
dask.config.get(key, default)
and returns the default ifNone
would be returned.
- finch.scheduler.parse_slurm_time(t: str) timedelta
Returns a timedelta from the given duration as is being passed to SLURM
-
finch.util.check_socket_open(host: str =
'localhost'
, ...) bool Return whether a port is in use / open (
True
) or not (False
).
-
finch.util.PathLike =
os.PathLike | str
Type alias for path types, as recommended py PEP 519.
- finch.util.get_absolute(path: os.PathLike | str, ...) Path
Return the abolute path in the given context if a relative path was given. If an absolute path is given, it is directly returned.
- finch.util.get_path(*args: os.PathLike | str) Path
Returns a new path by joining the given path arguments. If the directories do not exist yet, they will be created.
- finch.util.remove_if_exists(path: os.PathLike | str) Path
Removes the given directory if it exists and returns the original path.
- finch.util.clear_dir(path: os.PathLike | str) None
Removes the content of the given directory.
-
finch.util.PbarArg =
bool | tqdm.std.tqdm
Argument type for handling progress bars. Functions accepting the progress bar argument support outputting their progress via a tqdm progress bar. The argument can then either be a boolean, indicating that a new progress bar should be created, or no progress bar should be used at all, or it can be a preexisting progress bar which will be updated.
- finch.util.get_pbar(pbar, ...) tqdm.std.tqdm | None
Convenience function for handling progress bar arguments. This makes sure that a
tqdm
progress bar is returned if one is requested, orNone
.
- finch.util.fill_none_properties(x: T, y: T) T
Return
x
as a copy, where every attribute which isNone
is set to the attribute ofy
.
- finch.util.add_missing_properties(x: T, y: object) T
Return
x
as a copy, with attributes fromy
added tox
which were not already present.
- finch.util.equals_not_none(x: object, y: object) bool
Compares the common properties of the two given objects. Return
True
if the not-None properties present in both object are all equal.
- finch.util.has_attributes(x: object, y: object, ...) bool
Return true if y has the same not-None attributes as x.
- finch.util.get_class_attribute_names(cls: type, ...) list[str]
Return the attribute names of a class.
- finch.util.get_class_attributes(obj: object) dict[str, Any]
Return the class attributes of an object as a dictionary.
- finch.util.sig_matches_hint(sig: Signature, hint: Any) bool
Return
True
if the function signature and theCallable
type hint match.
- finch.util.list_funcs_matching(...) list[collections.abc.Callable]
Returns a list of functions from a module matching the given parameters.
- class finch.util.Config
Base class for configuration classes. Classes inheriting from this class must be dataclasses (with the @dataclass decorator).
-
finch.util.flatten_dict(d: dict, separator: str =
'_'
) dict Flattens a dictionary. The keys of the inner dictionary are appended to the outer key with the given separator.
- finch.util.recursive_update(d: dict, updates: dict) dict
Returns a copy of
d
with its content replaced byupdates
wherever specified. Nested dictionaries won’t be replaced, but updated recursively as specified byupdates
.
- class finch.util.RecursiveNamespace(types.SimpleNamespace)
A
types.SimpleNamespace
which can handle nested dictionaries.
- finch.util.is_list_of(val: list[Any], typ) TypeGuard[list[T]]
Type guard for checking lists.
- finch.util.is_2d_list_of(val, ...) TypeGuard[list[list[T]]]
Type guard for checking lists of lists.
- finch.util.is_callable_list(...) TypeGuard[list[collections.abc.Callable]]
Type guard for checking that a list contains callable objects.
-
finch.util.random_entity_name(excludes: list[str] =
[]
) str Return a random name for an entity, such as a file or a variable.
- finch.util.funcs_from_args(f, ...) list[collections.abc.Callable]
Takes a function
f
and a list of argumentsargs
and returns a list of functions which are the partial applications off
ontoargs
.
- finch.util.ImgSuffix
A literal for image file suffixes.