energy_fault_detector.data_preprocessing.duplicate_value_to_nan

class DuplicateValuesToNan(value_to_replace=0.0, n_max_duplicates=144, features_to_exclude=None)

Bases: DataTransformer

Replaces duplicate values with NaN.

In many data sets, zero can mean NaN, so we replace these duplicated values if they continue over n_max_duplicates times. The class can also be used for other values to replace.

Example:

value_to_replace = 0
n_max_duplicates = 2
Input: [0, 0, 0, 1, 2, 1, 3, 5, 1, 0, 0, 0, 0, 0, 7]
Output: [0, 0, np.nan, 1, 2, 1, 3, 5, 1, 0, 0, np.nan, np.nan, np.nan, 7]

Parameters:

value_to_replace (float) – The value to replace with NaN (default: 0.).
n_max_duplicates (int) – The maximum number of duplicates allowed before replacing with NaN (default: 144).
features_to_exclude (List[str]) – List of features to not transform. Defaults to None. Some sensors simply do not change for a while and that is ok.

feature_names_in_: list of column names in input.

feature_names_out_: list of columns in output.

Initialize the DuplicateValuesToNan transformer.

Parameters:

value_to_replace (float) – The value to replace with NaN (default: 0.).
n_max_duplicates (int) – The maximum number of duplicates allowed before replacing with NaN (default: 144).

fit(x, y=None)

Set feature names in and out.

Parameters:

x (Union[array, DataFrame]) – The input data as a numpy array or pandas DataFrame.
y (Optional[array]) – The target data as a numpy array (optional).

Returns:

The fitted DuplicateValuesToNan transformer.

Return type:

self

get_feature_names_out(input_features=None)

Returns the list of feature names in the output.

Return type:: List[str]

inverse_transform(x)

Not implemented for data replacer (not useful)

Return type:: DataFrame

transform(x)

Replace any value that is duplicated more than self.n_max_duplicates with NaN.

Parameters:: x (DataFrame) – The input data as a pandas DataFrame.
Return type:: DataFrame
Returns:: The transformed data with duplicate values replaced with NaN.