energy_fault_detector.utils.data_downloads

download_file(session, url, dest)

Download a single file to disk using streaming.

Parameters:
  • session (Session) – A requests.Session to perform the download.

  • url (str) – Direct file URL (e.g., links.content/download/self).

  • dest (Path) – Destination path to write the file to.

Raises:
  • requests.HTTPError – If the download fails.

  • OSError – If writing to disk fails.

download_zenodo_data(identifier='10.5281/zenodo.15846963', dest='./downloads', remove_zip=True, overwrite=False, flatten_file_structure=True, expected_file_types='*.csv')

Download a Zenodo record via API and unzip any .zip files.

Downloads all files associated with a given Zenodo record (by ID, DOI, or URL), saves them to a local directory, and optionally flattens nested directories that result from extracting ZIP archives.

Parameters:
  • identifier (str) – Zenodo record ID, DOI (e.g., 10.5281/zenodo.15846963), or record URL. Defaults to the CARE2Compare dataset.

  • dest (Path) – Local output directory to save downloaded files. (default: downloads)

  • remove_zip (bool) – If True, ZIP archives will be removed after extraction.

  • overwrite (bool) – If True and dest already exists, contents of dest will be overwritten. Default is False.

  • flatten_file_structure (bool) – If True and unzipping results in a single top-level folder with no conflicting root-level files matching expected_file_types, moves its contents up one level. Default is True.

  • expected_file_types (Union[List[str], str]) – Glob pattern(s) used to detect existing relevant files at the root. If any match, flattening is skipped. Can be a string like ‘.csv’ or list like [’.csv’, ‘.json’]. Default is ‘.csv’.

Returns:

The absolute path to the directory containing the downloaded and unzipped data.

Return type:

Path

fetch_record(session, record_id)

Fetch record metadata from Zenodo’s REST API.

Parameters:
  • session (Session) – A requests.Session (may include auth header for restricted files).

  • record_id (str) – Numeric Zenodo record ID.

Return type:

dict

Returns:

Parsed JSON payload of the record.

Raises:

requests.HTTPError – If the HTTP request fails.

list_files(session, record_json)

Return a list of file descriptors for a Zenodo record.

Supports both embedded ‘files’ in the record JSON and the ‘links.files’ endpoint (newer API).

Parameters:
  • session (Session) – A requests.Session to use for any follow-up call.

  • record_json (dict) – Record JSON as returned by fetch_record().

Return type:

list[dict]

Returns:

A list of file dicts with at least ‘links’ and ‘key’/’filename’.

Raises:
  • RuntimeError – If no files are found.

  • requests.HTTPError – If loading the files listing endpoint fails.

parse_record_id(identifier)

Extract a Zenodo record ID from an ID, DOI, or URL.

Accepts:
Parameters:

identifier (str) – Input string containing an ID, DOI, or URL.

Return type:

str

Returns:

The numeric record ID as a string.

Raises:

ValueError – If a record ID cannot be parsed from the input.

prepare_output_dir(out_dir, overwrite)

Ensure the output directory is ready.

If the directory exists and overwrite is True, its contents are removed and the directory is recreated empty. If it exists and overwrite is False, it is left as is. If it does not exist, it is created.

Parameters:
  • out_dir (Path) – Target output directory path.

  • overwrite (bool) – Whether to clear and recreate the directory if it exists.

Raises:
  • OSError – If filesystem operations fail.

  • RuntimeError – If out_dir points to an unsafe path to remove.

Return type:

None

recursive_safe_extract(zip_path, dest_dir, remove_archives=True)

Recursively extracts ZIP files, including those found inside other ZIPs.

Parameters:
  • zip_path (Path) – Path to the .zip archive.

  • dest_dir (Path) – Directory to extract into.

  • remove_archives (bool) – Whether to delete the .zip file after successful extraction.

safe_extract_zip(zip_path, dest_dir)

Extract a ZIP archive safely, preventing path traversal (zip-slip).

Validates that each member will extract under dest_dir before extraction.

Parameters:
  • zip_path (Path) – Path to the .zip archive.

  • dest_dir (Path) – Directory to extract into (created if missing).

Raises: