energy_fault_detector.evaluation.care_score

class CAREScore(coverage_beta=0.5, reliability_beta=0.5, coverage_w=1.0, accuracy_w=2.0, earliness_w=1.0, reliability_w=1.0, criticality_threshold=72, min_fraction_anomalous=0.1, ws_start_of_descend=(1, 4), anomaly_detection_method='criticality', *, min_fraction_anomalous_timestamps=None, eventwise_f_score_beta=None, weighted_score_w=None, eventwise_f_score_w=None)

Bases: object

Calculate the CARE score for early fault-detection algorithms.

The CARE score combines Coverage, Accuracy, Reliability and Earliness to evaluate early fault-detection performance (see CARE to Compare: A Real-World Benchmark Dataset for Early Fault Detection in Wind Turbine Data,

https://doi.org/10.3390/data9120138). The goal of the CARE-Score is to evaluate the ability of a given model to

separate normal behavior from actionable anomalies (see glossary for definitions), that lead to a fault or indicate a fault.

Usage:

For each event in your dataset call evaluate_event. Afterward, call get_final_score to calculate the CARE-score of your model.

Requirements:

For calculating the CARE-Score presence of at least one evaluated anomaly-event and at least one evaluated normal event (see glossary below) is required.

Glossary:

normal behavior: Data points representing expected system behavior.
actionable anomaly: Unexpected system behavior where maintenance could prevent a fault.
non-actionable anomaly: Neither normal behavior nor actionable anomaly (e.g. turbine shut down due to maintenance actions).
anomaly event: A sequence labeled with actionable anomalies within the prediction window.
normal event: A sequence containing normal behavior within the prediction window.
normal_index: Boolean mask separating normal behavior (True) from non-actionable anomaly (False).
Coverage: Pointwise F-score on anomaly-events.
Accuracy: Accuracy on normal-events.
Reliability: Eventwise F-score across evaluated events.
Earliness: Weighted score measuring how early an anomaly is detected.
criticality: Counting measure applied to the prediction of anomaly detection models. Used for criticality-based detection. For a detailed explanation see Algorithm 1 in the paper (https://doi.org/10.3390/data9120138). Choose a criticality threshold based on application needs: e.g. how many anomalous points need to be observed in advance to be actionable? After how many anomalous data points is an anomaly significant?

Initialize CAREScore.

Parameters:

coverage_beta (float) – Beta parameter for Coverage (pointwise F-score). Default: 0.5.
reliability_beta (float) – Beta parameter for Reliability (event-wise F-score). Default: 0.5.
coverage_w (float) – Weight for Coverage (point-wise F-Score) in the final CARE-score. Default: 1.0.
accuracy_w (float) – Weight for Accuracy in the final CARE-score. Default: 2.0.
reliability_w (float) – Weight for Reliability (eventwise F-score) in the final CARE-score. Default: 1.0.
earliness_w (float) – Weight for Earliness (weighted score) in the final CARE-score. Default: 1.0.
anomaly_detection_method (str) – Method used to calculate anomaly detection score. Either ‘criticality’ or ‘fraction’. Default: ‘criticality’.
criticality_threshold (int) – Threshold for criticality-based detection. Default: 72. If the criticality exceeds this threshold, the event will be detected as an anomaly.
min_fraction_anomalous (float) – Threshold for fraction-based detection. Default: 0.1. If the fraction of event data points exceeds this threshold, the event will be detected as an anomaly.
ws_start_of_descend (Tuple[int, int]) – Fraction (numerator, denominator) where weights start to decay for the Earliness-Score (weighted score). Default is (1, 4).

Raises:

ValueError – If anomaly_detection_method is not ‘criticality’ or ‘fraction’.

calculate_avg_accuracy(event_selection=None)

Return the average Accuracy across normal events.

Parameters:: event_selection (list[int], optional) – List of event IDs to include. Default: None (use all evaluated events).
Returns:: Mean accuracy for selected normal events. Returns numpy.nan if no normal events are selected.
Return type:: float

calculate_avg_coverage(event_selection=None)

Return the average Coverage (pointwise F-score) for anomaly events.

Parameters:: event_selection (list[int], optional) – List of event IDs to include. Default: None (use all evaluated events).
Returns:: Mean f_beta_score for selected anomaly events. Returns numpy.nan if no anomaly events are selected.
Return type:: float

calculate_avg_earliness(event_selection=None)

Return the average Earliness (weighted score) for anomaly events.

Parameters:: event_selection (list[int], optional) – List of event IDs to include. Default: None (use all evaluated events).
Returns:: Mean weighted_score for selected anomaly events. Returns numpy.nan if no anomaly events are selected.
Return type:: float

calculate_reliability(event_selection=None, **kwargs)

Compute the Reliability (event-wise F-score) for selected events.

Parameters:

event_selection (list[int], optional) – List of event IDs to include. Default: None (use all evaluated events).
kwargs – Other keyword args for sklearn’s fbeta_score.

Returns:

Event-wise F-score computed with beta=self.reliability_beta.: If there are no positive labels or predictions, sklearn’s fbeta_score behavior is controlled by zero_division.

Return type:

float

static create_ground_truth(event_start, event_end, normal_index, event_label)

Create the ground truth labels based on the event_start, event_end, normal_index and event_label.

Parameters:

event_start (int, pd.Timestamp) – Start index/timestamp of the event.
event_end (int, pd.Timestamp) – End index/timestamp of the event.
normal_index (pd.Series) – Boolean mask indicating normal samples. Must be indexed compatibly with event_start/event_end.
event_label (str) – ‘anomaly’ or ‘normal’. True label indicating the type of the event.

Returns:

Boolean series indexed like normal_index. True indicates anomaly (actionable and non-actionable),: False otherwise.

Return type:

pd.Series

Notes

The returned series is sorted by index.
If event_label == ‘anomaly’, values in the interval [event_start:event_end] are set to True.
normal_index is inverted to start (anomalies = not normal), then the event window is applied.

evaluate_event(event_start, event_end, event_label, predicted_anomalies, normal_index=None, evaluate_until_event_end=False, event_id=None, ignore_normal_index=False)

Evaluate the prediction of a fault detection model for a single event.

If a normal_index is provided, metrics are computed only for timestamps where normal_index is True (unless ignore_normal_index is True).

The argument evaluate_until_event_end determines which part of the provided data is used for evaluation. It might be useful to set this to True or anomaly_only if you expect normal behaviour may change after a fault.

Parameters:

event_start (int, pd.Timestamp) – Start index/timestamp of the event.
event_end (int, pd.Timestamp) – End index/timestamp of the event.
event_label (str) – True label of the event. This can be either ‘anomaly’ or ‘normal’.
predicted_anomalies (pd.Series) – Boolean pandas series, indicating whether an anomaly was detected. Index must match the data type of event_start and event_end.
normal_index (pd.Series, optional) – Boolean mask marking normal operation (True) vs non-actionable anomaly (False). Index must match the data type of event_start and event_end. Default: None.
evaluate_until_event_end (str or bool) – If True, evaluation is capped at event_end for all events. Allowed string values: ‘normal_only’, ‘anomaly_only’. Default: False.
event_id (int) – ID of event. If not specified, a counter is used instead. Defaults to None.
ignore_normal_index (bool) – Whether to ignore the normal index and evaluate all data points in the prediction or test dataset. Default False.

Returns:

Dictionary with computed metrics, e.g.:

{: ‘event_id’: int, ‘event_label’: str, ‘weighted_score’: float, ‘max_criticality’: float, ‘f_beta_score’: float or NaN, ‘accuracy’: float, ‘tp’: int, ‘fp’: int, ‘tn’: int, ‘fn’: int

}

Return type:

dict

Raises:

ValueError – If event_label is invalid, evaluate_until_event_end has an unknown value, or if no data could be selected for the event.

Notes

The function sorts inputs by index to ensure alignment.
If normal_index is provided, this also influences the criticality calculation: criticality does not change

if the expected behaviour is not normal. - If predicted_anomalies_event is empty, a ValueError is raised. - Use evaluate_until_event_end to control whether post-event predictions are considered.

property evaluated_events: DataFrame

Returns a DataFrame with evaluated events.

The DataFrame is built from the internal evaluated-events list. If the DataFrame is non-empty, an additional column ‘anomaly_detected’ is computed:

If anomaly_detection_method == ‘criticality’: anomaly_detected = max_criticality >= criticality_threshold
Else (fraction-based): anomaly_detected = (tp + fp) / (tp + fp + fn + tn) >= min_fraction_anomalous

Returns:

DataFrame containing evaluated event records. Expected columns include:

event_id (int)
event_label (str)
weighted_score (float)
max_criticality (float)
tp, fp, tn, fn (ints)
f_beta_score, accuracy (floats)
anomaly_detected (bool) — added as described above.

Return type:

pd.DataFrame

get_final_score(event_selection=None, criticality_threshold=None, min_fraction_anomalous=None)

Calculate the CARE-score for selected evaluated events.

The CARE score combines average Coverage (pointwise F-score for anomaly events), average Earliness (weighted score for anomaly events), average Accuracy (for normal events) and Reliability (eventwise F-score) using the configured weights.

If the average accuracy over all normal events < 0.5, CARE-score = average accuracy over all normal events: (worse than random guessing).

If no anomalies were detected, the CARE-score = 0. Else, the CARE-score is calculated as:

( (average F-score over all anomaly events) * coverage_w

(average weighted score over all anomaly events) * weighted_score_w

(average accuracy over all normal events) * accuracy_w

event wise F-score * eventwise_f_score_w ) / sum_of_weights

where sum_of_weights = coverage_w + weighted_score_w + accuracy_w + eventwise_f_score_w.

Parameters:

event_selection (List[int]) – list of event IDs to include. Default: None (use all).
criticality_threshold (int) – If provided and anomaly_detection_method == ‘criticality’, override the stored threshold for this calculation.
min_fraction_anomalous (float) – If provided and anomaly_detection_method == ‘fraction’, override the stored min_fraction_anomalous for this calculation.

Returns:

CARE-score

Return type:

float

Raises:

ValueError – If the selected events do not contain at least one normal and one anomalous event.

load_evaluated_events(file_path)

Load evaluated events from a CSV file and replace the internal evaluated-events list.

Parameters:: file_path (Path or str) – The file path from which the evaluated events will be loaded.
Return type:: None

save_evaluated_events(file_path)

Write the evaluated events to a CSV file.

Parameters:: file_path (Path or str) – The file path where the evaluated events will be saved.
Return type:: None