energy_fault_detector.evaluation.care_score
- class CAREScore(coverage_beta=0.5, reliability_beta=0.5, coverage_w=1.0, accuracy_w=2.0, earliness_w=1.0, reliability_w=1.0, criticality_threshold=72, min_fraction_anomalous=0.1, ws_start_of_descend=(1, 4), anomaly_detection_method='criticality', *, min_fraction_anomalous_timestamps=None, eventwise_f_score_beta=None, weighted_score_w=None, eventwise_f_score_w=None)
Bases:
objectCalculate the CARE score for early fault-detection algorithms.
The CARE score combines Coverage, Accuracy, Reliability and Earliness to evaluate early fault-detection performance (see CARE to Compare: A Real-World Benchmark Dataset for Early Fault Detection in Wind Turbine Data,
https://doi.org/10.3390/data9120138). The goal of the CARE-Score is to evaluate the ability of a given model to
separate normal behavior from actionable anomalies (see glossary for definitions), that lead to a fault or indicate a fault.
- Usage:
For each event in your dataset call evaluate_event. Afterward, call get_final_score to calculate the CARE-score of your model.
- Requirements:
For calculating the CARE-Score presence of at least one evaluated anomaly-event and at least one evaluated normal event (see glossary below) is required.
- Glossary:
normal behavior: Data points representing expected system behavior.
actionable anomaly: Unexpected system behavior where maintenance could prevent a fault.
non-actionable anomaly: Neither normal behavior nor actionable anomaly (e.g. turbine shut down due to maintenance actions).
anomaly event: A sequence labeled with actionable anomalies within the prediction window.
normal event: A sequence containing normal behavior within the prediction window.
normal_index: Boolean mask separating normal behavior (True) from non-actionable anomaly (False).
Coverage: Pointwise F-score on anomaly-events.
Accuracy: Accuracy on normal-events.
Reliability: Eventwise F-score across evaluated events.
Earliness: Weighted score measuring how early an anomaly is detected.
criticality: Counting measure applied to the prediction of anomaly detection models. Used for criticality-based detection. For a detailed explanation see Algorithm 1 in the paper (https://doi.org/10.3390/data9120138). Choose a criticality threshold based on application needs: e.g. how many anomalous points need to be observed in advance to be actionable? After how many anomalous data points is an anomaly significant?
Initialize CAREScore.
- Parameters:
coverage_beta (float) – Beta parameter for Coverage (pointwise F-score). Default: 0.5.
reliability_beta (float) – Beta parameter for Reliability (event-wise F-score). Default: 0.5.
coverage_w (float) – Weight for Coverage (point-wise F-Score) in the final CARE-score. Default: 1.0.
accuracy_w (float) – Weight for Accuracy in the final CARE-score. Default: 2.0.
reliability_w (float) – Weight for Reliability (eventwise F-score) in the final CARE-score. Default: 1.0.
earliness_w (float) – Weight for Earliness (weighted score) in the final CARE-score. Default: 1.0.
anomaly_detection_method (str) – Method used to calculate anomaly detection score. Either ‘criticality’ or ‘fraction’. Default: ‘criticality’.
criticality_threshold (int) – Threshold for criticality-based detection. Default: 72. If the criticality exceeds this threshold, the event will be detected as an anomaly.
min_fraction_anomalous (float) – Threshold for fraction-based detection. Default: 0.1. If the fraction of event data points exceeds this threshold, the event will be detected as an anomaly.
ws_start_of_descend (Tuple[int, int]) – Fraction (numerator, denominator) where weights start to decay for the Earliness-Score (weighted score). Default is (1, 4).
- Raises:
ValueError – If anomaly_detection_method is not ‘criticality’ or ‘fraction’.
- calculate_avg_accuracy(event_selection=None)
Return the average Accuracy across normal events.
- calculate_avg_coverage(event_selection=None)
Return the average Coverage (pointwise F-score) for anomaly events.
- calculate_avg_earliness(event_selection=None)
Return the average Earliness (weighted score) for anomaly events.
- calculate_reliability(event_selection=None, **kwargs)
Compute the Reliability (event-wise F-score) for selected events.
- Parameters:
- Returns:
- Event-wise F-score computed with beta=self.reliability_beta.
If there are no positive labels or predictions, sklearn’s fbeta_score behavior is controlled by zero_division.
- Return type:
- static create_ground_truth(event_start, event_end, normal_index, event_label)
Create the ground truth labels based on the event_start, event_end, normal_index and event_label.
- Parameters:
event_start (int, pd.Timestamp) – Start index/timestamp of the event.
event_end (int, pd.Timestamp) – End index/timestamp of the event.
normal_index (pd.Series) – Boolean mask indicating normal samples. Must be indexed compatibly with event_start/event_end.
event_label (str) – ‘anomaly’ or ‘normal’. True label indicating the type of the event.
- Returns:
- Boolean series indexed like normal_index. True indicates anomaly (actionable and non-actionable),
False otherwise.
- Return type:
pd.Series
Notes
The returned series is sorted by index.
If event_label == ‘anomaly’, values in the interval [event_start:event_end] are set to True.
normal_index is inverted to start (anomalies = not normal), then the event window is applied.
- evaluate_event(event_start, event_end, event_label, predicted_anomalies, normal_index=None, evaluate_until_event_end=False, event_id=None, ignore_normal_index=False)
Evaluate the prediction of a fault detection model for a single event.
If a normal_index is provided, metrics are computed only for timestamps where normal_index is True (unless ignore_normal_index is True).
The argument evaluate_until_event_end determines which part of the provided data is used for evaluation. It might be useful to set this to True or anomaly_only if you expect normal behaviour may change after a fault.
- Parameters:
event_start (int, pd.Timestamp) – Start index/timestamp of the event.
event_end (int, pd.Timestamp) – End index/timestamp of the event.
event_label (str) – True label of the event. This can be either ‘anomaly’ or ‘normal’.
predicted_anomalies (pd.Series) – Boolean pandas series, indicating whether an anomaly was detected. Index must match the data type of event_start and event_end.
normal_index (pd.Series, optional) – Boolean mask marking normal operation (True) vs non-actionable anomaly (False). Index must match the data type of event_start and event_end. Default: None.
evaluate_until_event_end (str or bool) – If True, evaluation is capped at event_end for all events. Allowed string values: ‘normal_only’, ‘anomaly_only’. Default: False.
event_id (int) – ID of event. If not specified, a counter is used instead. Defaults to None.
ignore_normal_index (bool) – Whether to ignore the normal index and evaluate all data points in the prediction or test dataset. Default False.
- Returns:
- Dictionary with computed metrics, e.g.:
- {
‘event_id’: int, ‘event_label’: str, ‘weighted_score’: float, ‘max_criticality’: float, ‘f_beta_score’: float or NaN, ‘accuracy’: float, ‘tp’: int, ‘fp’: int, ‘tn’: int, ‘fn’: int
}
- Return type:
- Raises:
ValueError – If event_label is invalid, evaluate_until_event_end has an unknown value, or if no data could be selected for the event.
Notes
The function sorts inputs by index to ensure alignment.
If normal_index is provided, this also influences the criticality calculation: criticality does not change
if the expected behaviour is not normal. - If predicted_anomalies_event is empty, a ValueError is raised. - Use evaluate_until_event_end to control whether post-event predictions are considered.
- property evaluated_events: DataFrame
Returns a DataFrame with evaluated events.
The DataFrame is built from the internal evaluated-events list. If the DataFrame is non-empty, an additional column ‘anomaly_detected’ is computed:
If anomaly_detection_method == ‘criticality’: anomaly_detected = max_criticality >= criticality_threshold
Else (fraction-based): anomaly_detected = (tp + fp) / (tp + fp + fn + tn) >= min_fraction_anomalous
- Returns:
- DataFrame containing evaluated event records. Expected columns include:
event_id (int)
event_label (str)
weighted_score (float)
max_criticality (float)
tp, fp, tn, fn (ints)
f_beta_score, accuracy (floats)
anomaly_detected (bool) — added as described above.
- Return type:
pd.DataFrame
- get_final_score(event_selection=None, criticality_threshold=None, min_fraction_anomalous=None)
Calculate the CARE-score for selected evaluated events.
The CARE score combines average Coverage (pointwise F-score for anomaly events), average Earliness (weighted score for anomaly events), average Accuracy (for normal events) and Reliability (eventwise F-score) using the configured weights.
- If the average accuracy over all normal events < 0.5, CARE-score = average accuracy over all normal events
(worse than random guessing).
If no anomalies were detected, the CARE-score = 0. Else, the CARE-score is calculated as:
- ( (average F-score over all anomaly events) * coverage_w
(average weighted score over all anomaly events) * weighted_score_w
(average accuracy over all normal events) * accuracy_w
event wise F-score * eventwise_f_score_w ) / sum_of_weights
where sum_of_weights = coverage_w + weighted_score_w + accuracy_w + eventwise_f_score_w.
- Parameters:
event_selection (List[int]) – list of event IDs to include. Default: None (use all).
criticality_threshold (int) – If provided and anomaly_detection_method == ‘criticality’, override the stored threshold for this calculation.
min_fraction_anomalous (float) – If provided and anomaly_detection_method == ‘fraction’, override the stored min_fraction_anomalous for this calculation.
- Returns:
CARE-score
- Return type:
- Raises:
ValueError – If the selected events do not contain at least one normal and one anomalous event.
- load_evaluated_events(file_path)
Load evaluated events from a CSV file and replace the internal evaluated-events list.