energy_fault_detector.quick_fault_detection.data_loading

apply_boolean_mapping(data, mapping)

Applies the provided boolean mapping to the data if the mapping is valid.

Return type:

Series

detect_encoding(file_path)

Uses chardet to detect the encoding of the file

Return type:

str

detect_separator(file_path, encoding)

Estimates the seperator based on the number of occurrences.

Return type:

str

get_boolean_feature(df, bool_data_column_name=None, boolean_mapping=None)

Extracts a boolean feature from the dataframe df. If the specified column does not contain boolean values and boolean mapping is provided, casting of the column is attempted. If the casting fails None is returned.

Parameters:
  • df (pd.DataFrame) – DataFrame containing data.

  • bool_data_column_name (Optional[str]) – Name of the column where the boolean feature is located. Default is None.

  • boolean_mapping (Optional[dict]) – Dictionary defining a mapping of non-boolean values in bool_data_column to booleans

Return type:

Optional[Series]

Returns:

Pandas Series containing booleans or None if casting steps fail or if the column does not exist.

get_sensor_data(df)

Selects all features from df which are either numeric features or features which can be casted to numeric datatypes

Return type:

DataFrame

load_data(csv_data_path, train_test_column_name=None, train_test_mapping=None, time_column_name=None, status_data_column_name=None, status_mapping=None)

Load data from csv_data_path and split it into numerical data and normal_index. Optionally performs a train test split is performed if train_test_column_name is not None.

Parameters:
  • csv_data_path (str) – Path to a csv-file containing tabular data which must contain training data for the autoencoder. This data can also contain test data for evaluation, but in this case train_test_column and optionally train_test_mapping must be provided.

  • train_test_column_name (Optional str) – Name of the column which specifies which part of the data in csv_data_path is training data and which is test data. If this column does not contain boolean values or values which can be cast into boolean values, then train_test_mapping must be provided. Default is None.

  • train_test_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the train_test_column to booleans. Keys of the dictionary must be values from train_test_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.

  • time_column_name (Optional str) – Name of the column containing time stamp information.

  • status_data_column_name (Optional str) – Name of the column which specifies the status of each row in csv_data_path. The status is used to define which rows represent normal behavior (i.e. which rows can be used for the autoencoder training) and which rows contain anomalous behavior. If this column does not contain boolean values, status_mapping must be provided. If status_data_column_name is not provided, all rows in csv_data_path are assumed to be normal and a warning will be logged. Default is None.

  • status_mapping (Optional[dict]) – Dictionary which defines a mapping of all non-boolean values in the status_data_column to booleans. Keys of the dictionary must be values from status_data_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.

Return type:

Tuple[DataFrame, Series, Optional[DataFrame]]

Returns: tuple

train_data: numerical data of the training section of the data. normal_index: boolean series specifying which samples of the training data are normal. test_data (Union[pd.DataFrame, None]): numerical data of the test section of the data. Is only not None if train_test_column_name is given.

load_train_test_data(csv_data_path, csv_test_data_path=None, train_test_column_name=None, train_test_mapping=None, time_column_name=None, status_data_column_name=None, status_mapping=None)

This function extracts numerical training and test data from csv-files and provides a normal index for the training data. If multiple sources of test data are given, both sources will be fused into one test data set. If no test data is provided an exception is raised.

Parameters:
  • csv_data_path (str) – Path to a csv-file containing tabular data which must contain training data for the autoencoder. This data can also contain test data for evaluation, but in this case train_test_column and optionally train_test_mapping must be provided.

  • csv_test_data_path (Optional str) – Path to a csv file containing test data for evaluation. If test data is provided in both ways (i.e. via csv_test_data_path and in csv_data_path + train_test_column) then both test data sets will be fused into one. Default is None.

  • train_test_column_name (Optional str) – Name of the column which specifies which part of the data in csv_data_path is training data and which is test data. If this column does not contain boolean values or values which can be cast into boolean values, then train_test_mapping must be provided. Default is None.

  • train_test_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the train_test_column to booleans. Keys of the dictionary must be values from train_test_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.

  • time_column_name (Optional str) – Name of the column containing time stamp information.

  • status_data_column_name (Optional str) – Name of the column which specifies the status of each row in csv_data_path. The status is used to define which rows represent normal behavior (i.e. which rows can be used for the autoencoder training) and which rows contain anomalous behavior. If this column does not contain boolean values, status_mapping must be provided. If status_data_column_name is not provided, all rows in csv_data_path are assumed to be normal and a warning will be logged. Default is None.

  • status_mapping (Optional[dict]) – Dictionary which defines a mapping of all non-boolean values in the status_data_column to booleans. Keys of the dictionary must be values from status_data_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.

Return type:

Tuple[DataFrame, Series, DataFrame]

Returns: tuple

train_data (pd.DataFrame): Contains training data for the AnomalyDetector (only numeric values). train_normal_index (pd.Series): Contains boolean information about which rows of train_data are normal and which contain anomalous behavior. test_data (pd.DataFrame): Contains test data for the AnomalyDetector (only numeric values).

read_csv_file(csv_data_path, time_column_name)

Checks if the csv file exists and extracts a dataframe after determining the file encoding and the seperator. if time_column_name is not None this column is set as index. If csv_data_path does not point to a file an exception is raised.

Parameters:
  • csv_data_path (str) – Path to a csv-file containing data.

  • time_column_name (Union[str, None]) – Name of the time stamp column in the data.

Returns:

Contents of csv_data_path

Return type:

pd.DataFrame

validate_mapping(mapping, data_type)

Validates the data types of keys and values in the mapping dictionary.

Return type:

bool