energy_fault_detector.data_preprocessing.counter_diff_transformer

class CounterDiffTransformer(counters, compute_rate=False, reset_strategy='zero', rollover_values=None, small_negative_tolerance=0.0, fill_first='nan', keep_original=False, gap_policy='mask', max_gap_seconds=None, max_gap_factor=3.0)

Bases: DataTransformer

Transform monotonic counter columns into per-sample increments (default) or per-second rates (if compute_rate=True), handling resets/rollovers and masking long time gaps.

It handles counter resets/rollovers and optionally masks values after large time gaps, which helps avoid misleading diffs/rates caused by missing data.

Parameters:
  • counters (List[str]) – List of counter column names to transform.

  • compute_rate (bool) – If True, output per-second rates (increment / dt). If False (default), output per-sample increments.

  • reset_strategy (str) –

    One of {‘zero’, ‘rollover’, ‘nan’, ‘auto’}:

    • ’zero’ (default): if diff < 0, treat as reset-to-zero; increment = current_value.

    • ’rollover’: if diff < 0, increment = current_value + (rollover_value - previous_value).

    • ’nan’: if diff < 0, set increment to NaN.

    • ’auto’: use ‘rollover’ if rollover_values contains the counter; otherwise ‘zero’.

  • rollover_values (Optional[Dict[str, float]]) – Optional mapping counter -> known max value (used by ‘rollover’ or ‘auto’).

  • small_negative_tolerance (float) – Treat small negative diffs (abs(diff) <= tol) as 0 (noise). Default: 0.0.

  • fill_first (str) – One of {‘nan’, ‘zero’}. How to fill the first sample where diff is undefined.

  • keep_original (bool) – If True, keep original counters alongside new outputs. If False, drop them.

  • gap_policy (str) –

    One of {‘mask’, ‘ignore’}:

    • ’mask’ (default): set output to NaN for rows where time delta > threshold.

    • ’ignore’: do nothing special for large gaps.

  • max_gap_seconds (Optional[float]) – Explicit threshold (in seconds) for gap masking. If provided, overrides max_gap_factor.

  • max_gap_factor (float) – If max_gap_seconds is None, use threshold = factor * median(dt). Default is 3.0.

Notes

  • A DatetimeIndex is required if compute_rate=True or gap_policy=’mask’.

  • The inverse_transform is a no-op and returns the input unchanged.

Examples

  • Diffs: [0, 1, 3, 0 (reset), 2] -> [NaN|0, 1, 2, 0|NaN, 2]

  • Rates: increment / dt (in seconds), with large-gap rows optionally masked to NaN.

fit(x, y=None)

Validate inputs and compute output schema.

This method validates the time index (when needed), stores the list of counters that are present in the input, and computes the output column layout such that transform() can reproduce the same order deterministically.

Parameters:
  • x (DataFrame) – Input DataFrame. Requires a DatetimeIndex if compute_rate=True or gap_policy=’mask’.

  • y (Optional[Series]) – Unused. Present for estimator interface compatibility.

Return type:

CounterDiffTransformer

Returns:

self

Raises:

ValueError – If a DatetimeIndex is required but missing or non-monotonic.

get_feature_names_out(input_features=None)

Return the output feature names determined in fit().

Parameters:

input_features (Optional[List[str]]) – Unused. Present for compatibility with sklearn API.

Return type:

List[str]

Returns:

List of output column names.

inverse_transform(x)

If original counter columns are present, drop the derived columns and restore original feature order. Otherwise, returns the input as is.

Parameters:

x (DataFrame) – Input DataFrame.

Return type:

DataFrame

Returns:

The input DataFrame unchanged.

transform(x)

Transform counters into diffs or rates, with optional gap masking.

For each configured counter:
  1. Compute per-sample increment with reset handling.

  2. If compute_rate=True, divide by dt seconds.

  3. If gap_policy=’mask’, set values to NaN where dt > gap_threshold.

Parameters:

x (DataFrame) – Input DataFrame. Requires a DatetimeIndex if compute_rate=True or gap_policy=’mask’.

Return type:

DataFrame

Returns:

A DataFrame with transformed columns appended (if keep_original=True) or replacing the original counters (if keep_original=False). Column order matches fit()’s schema.

Raises:

ValueError – If DatetimeIndex is required but missing or non-monotonic.