energy_fault_detector.data_preprocessing.duplicate_value_to_nan
- class DuplicateValuesToNan(value_to_replace=0.0, n_max_duplicates=144, features_to_exclude=None)
Bases:
DataTransformerReplaces duplicate values with NaN.
In many data sets, zero can mean NaN, so we replace these duplicated values if they continue over n_max_duplicates times. The class can also be used for other values to replace.
Example:
value_to_replace = 0 n_max_duplicates = 2 Input: [0, 0, 0, 1, 2, 1, 3, 5, 1, 0, 0, 0, 0, 0, 7] Output: [0, 0, np.nan, 1, 2, 1, 3, 5, 1, 0, 0, np.nan, np.nan, np.nan, 7]
- Parameters:
value_to_replace (
float) – The value to replace with NaN (default: 0.).n_max_duplicates (
int) – The maximum number of duplicates allowed before replacing with NaN (default: 144).features_to_exclude (
List[str]) – List of features to not transform. Defaults to None. Some sensors simply do not change for a while and that is ok.
- feature_names_in_
list of column names in input.
- feature_names_out_
list of columns in output.
Initialize the DuplicateValuesToNan transformer.
- Parameters:
- fit(x, y=None)
Set feature names in and out.
- Parameters:
x (
Union[array,DataFrame]) – The input data as a numpy array or pandas DataFrame.y (
Optional[array]) – The target data as a numpy array (optional).
- Returns:
The fitted DuplicateValuesToNan transformer.
- Return type:
self
- get_feature_names_out(input_features=None)
Returns the list of feature names in the output.
- inverse_transform(x)
Not implemented for data replacer (not useful)
- Return type:
DataFrame
- transform(x)
Replace any value that is duplicated more than self.n_max_duplicates with NaN.
- Parameters:
x (
DataFrame) – The input data as a pandas DataFrame.- Return type:
DataFrame- Returns:
The transformed data with duplicate values replaced with NaN.