energy_fault_detector.data_preprocessing.column_selector

class ColumnSelector(max_nan_frac_per_col=0.05, features_to_exclude=None)

Bases: DataTransformer

Class for selecting columns, using the provided list of features to exclude/drop and the fraction of NaNs.

Parameters:
  • max_nan_frac_per_col (float) – maximum fraction of NaN values allowed per column. Defaults to 0.05. If the fraction exceeds max_nan_frac_per_col, the column is dropped.

  • features_to_exclude (List[str]) – list of features that should be dropped. Defaults to None.

feature_names_in_

list of column names in input.

n_features_in_

number of columns in input.

feature_names_out_

list of column names to keep / selected.

columns_dropped_

list of columns that were dropped.

fit(x, y=None)

Find columns to keep for training

Parameters:
  • x (DataFrame) – data to filter based on NaN fractions

  • y (Optional[array]) – target variable, currently unused.

Return type:

ColumnSelector

get_feature_names_out(input_features=None)

Returns the list of feature names in the output.

Return type:

List[str]

inverse_transform(x)

Inverse transform does nothing in case of column selector - since the columns dropped are not reconstructed.

Return type:

DataFrame

transform(x)

Drop columns from dataframe x.

Return type:

DataFrame