qmi.data.dataset

Data structures for measurement data.

Functions

convert_to_qmi_dataset(parent)

A function to convert a HDF5 dataset, in a group, or in file root, to a QMI dataset.

read_dataset_from_hdf5(parent[, container])

Extract a QMI DataSet instance from the specified HDF5 dataset (group).

read_dataset_from_text(fh)

Read a DataSet instance from a text file.

write_dataset_to_hdf5(dataset, hdf_group)

Write the specified dataset to the specified HDF5 group.

write_dataset_to_text(dataset, fh)

Write the specified dataset to a text file.

Classes

DataSet(name[, shape, dtype, data])

A dataset is a series of values obtained during a measurement.

class qmi.data.dataset.DataSet(name: str, shape: tuple[int, ...] | None = None, dtype: dtype | type | None = None, data: ndarray | None = None)

A dataset is a series of values obtained during a measurement.

A dataset contains an array of values in the form of a N-dimensional Numpy array.

For raw datasets without measurement axes, the array may be one-dimensional (n,) for a single data column, or two-dimensional (nrow, ncol) for tabular data with multiple columns.

For axis-based datasets, the last axis of the array acts as a “column index” while the first axes represent independent variables or iterations in the measurement. Each of these axes may have an optional label, a physical unit, and a mapping of array indices to values on the physical axis.

Each data column may have an associated label and physical unit.

A dataset may have attributes. Each attribute has a name, which is a short string, unique to the dataset. Each attribute has a value which may be a string or a number.

Reading or changing values in the dataset is done by directly accessing the Numpy array in the DataSet instance. For example:

dataset.data[2, 0:5] += 1

The following fields exist inside a DataSet instance. Application code may read or modify the contents of these fields directly. However, the shape and data type of these fields must not be changed.

Internal Variables:

~DataSet.name: Name of the dataset. ~DataSet.data: Numpy array containing the actual data. ~DataSet.timestamp: POSIX time stamp associated with the data. axis_label: List of strings specifying labels for the measurement axes. axis_unit: List of strings specifying units for the measurement axes. axis_scale: List of optional 1D Numpy arrays specifying value mappings for the measurement axes. column_label: List of strings specifying column labels. column_unit: List of strings specifying column units. attrs: Dictionary of application-specific attributes.

The entire dataset is kept in memory (RAM). This makes the dataset class unsuitable for very large amounts of data.

set_axis_label(axis: int, label: str) None

Specify an axis label.

Parameters:
  • axis – Axis number (0, 1, …).

  • label – Label string of the axis.

set_axis_unit(axis: int, unit: str) None

Specify the physical unit for an axis.

Parameters:
  • axis – Axis number (0, 1, …).

  • unit – Unit string of the axis.

set_axis_name(axis: int, name: str) None

Specify an axis ‘long’ name.

Parameters:
  • axis – Axis number (0, 1, …).

  • name – ‘Long’ name string of the axis.

set_axis_scale(axis: int, scale: ndarray) None

Specify a mapping from array indices to physical values along an axis.

Parameters:
  • axis – Axis to which the mapping applies (the first axis has number 0).

  • scale – 1D Numpy array of values along the axis. The length must match the size of the axis.

set_column_label(col: int, label: str) None

Specify a label for a column in a multi-column data set.

Parameters:
  • col – Column number (0, 1, …).

  • label – Column label string.

set_column_unit(col: int, unit: str) None

Specify a physical unit for a column in a multi-column data set.

Parameters:
  • col – Column number (0, 1, …).

  • unit – Column unit string.

set_column_name(col: int, name: str) None

Specify a name for a column in a multi-column data set.

Parameters:
  • col – Column number (0, 1, …).

  • name – Descriptive name for column data.

qmi.data.dataset.write_dataset_to_hdf5(dataset: DataSet, hdf_group: Group | Group | File | File) None

Write the specified dataset to the specified HDF5 group.

The dataset “name” field determines the name of the corresponding HDF5 dataset. An error occurs if the HDF5 group already contains a dataset with the same name.

Note that this function may create additional supporting datasets in the HDF5 group if the DataSet instance uses axis scales. In this case, HDF5 datasets named “<datasetname>_axisN_scale” will be created in addition to the main dataset.

Parameters:
  • dataset – DataSet instance to write to HDF5.

  • hdf_group – HDF5 File or Group instance to which the dataset is written.

qmi.data.dataset.read_dataset_from_hdf5(parent: File | File | Group | Group | Dataset | Variable, container: File | File | Group | Group | None = None) DataSet

Extract a QMI DataSet instance from the specified HDF5 dataset (group).

Note that this function may fetch additional HDF5 datasets from the parent HDF5 group if the dataset uses dimension scales.

Parameters:
  • parent – HDF5 file/group container, or a child dataset for backwards compatibility.

  • container – Optional explicit parent file/group if parent is a child dataset.

Returns:

DataSet instance.

Return type:

dataset

qmi.data.dataset.convert_to_qmi_dataset(parent: File | File | Group | Group | Dataset | Variable) DataSet

A function to convert a HDF5 dataset, in a group, or in file root, to a QMI dataset.

If the input is a h5py.Dataset | h5netcdf.Variable, the dataset can have one or more dimensions.

If the input is s h5py.Group | h5netcdf.Group, and the group has multiple datasets, the dataset attributes are looked into if we can determine a scaled axis | column or columns, and data axis | axes. If so, it will be converted into single QMI dataset with (multiple) ax[i|e]s and column[s]. A single dataset will be converted as a 1D dataset.

If the input is a h5py.File | h5netcdf.File, and there are no groups, the handling is the same as for the group. If there is a single group present, that will be taken and handled like a group. For multiple groups in a file an error will be thrown.

qmi.data.dataset.write_dataset_to_text(dataset: DataSet, fh: TextIO) None

Write the specified dataset to a text file.

Note that this function may create additional supporting datasets in the HDF5 group if the DataSet instance uses axis scales. In this case, HDF5 datasets named “<datasetname>_axisN_scale” will be created in addition to the main dataset.

Parameters:
  • dataset – DataSet instance to write to HDF5.

  • fh – File handle open for writing in text mode.

qmi.data.dataset.read_dataset_from_text(fh: TextIO) DataSet

Read a DataSet instance from a text file.

Parameters:

fh – File handle open for reading in text mode.

Returns:

DataSet instance.