energy_fault_detector.utils.data_downloads

download_file(session, url, dest)

Download a single file to disk using streaming.

Parameters:

session (Session) – A requests.Session to perform the download.
url (str) – Direct file URL (e.g., links.content/download/self).
dest (Path) – Destination path to write the file to.

Raises:

requests.HTTPError – If the download fails.
OSError – If writing to disk fails.

download_zenodo_data(identifier='10.5281/zenodo.15846963', dest='./downloads', overwrite=False)

Download a Zenodo record via API and unzip any .zip files.

Parameters:

identifier (str) – Zenodo record ID, DOI (e.g., 10.5281/zenodo.15846963), or record URL
dest (Path) – Output directory (default: downloads)
overwrite (bool) – If True and dest already exists, contents of dest will be overwritten.

Returns:

List of paths the extracted content of all downloaded zip files. If there is only one: downloaded zip file only one path is returned

Return type:

Union[List[Path], Path]

fetch_record(session, record_id)

Fetch record metadata from Zenodo’s REST API.

Parameters:

session (Session) – A requests.Session (may include auth header for restricted files).
record_id (str) – Numeric Zenodo record ID.

Return type:

dict

Returns:

Parsed JSON payload of the record.

Raises:

requests.HTTPError – If the HTTP request fails.

list_files(session, record_json)

Return a list of file descriptors for a Zenodo record.

Supports both embedded ‘files’ in the record JSON and the ‘links.files’ endpoint (newer API).

Parameters:

session (Session) – A requests.Session to use for any follow-up call.
record_json (dict) – Record JSON as returned by fetch_record().

Return type:

list[dict]

Returns:

A list of file dicts with at least ‘links’ and ‘key’/’filename’.

Raises:

RuntimeError – If no files are found.
requests.HTTPError – If loading the files listing endpoint fails.

parse_record_id(identifier)

Extract a Zenodo record ID from an ID, DOI, or URL.

Accepts:

Numeric ID (e.g., “15846963”)
DOI (e.g., “10.5281/zenodo.15846963”)
Record URL (e.g., “https://zenodo.org/records/15846963”)

Parameters:: identifier (str) – Input string containing an ID, DOI, or URL.
Return type:: str
Returns:: The numeric record ID as a string.
Raises:: ValueError – If a record ID cannot be parsed from the input.

prepare_output_dir(out_dir, overwrite)

Ensure the output directory is ready.

If the directory exists and overwrite is True, its contents are removed and the directory is recreated empty. If it exists and overwrite is False, it is left as is. If it does not exist, it is created.

Parameters:

out_dir (Path) – Target output directory path.
overwrite (bool) – Whether to clear and recreate the directory if it exists.

Raises:

OSError – If filesystem operations fail.
RuntimeError – If out_dir points to an unsafe path to remove.

Return type:

None

safe_extract_zip(zip_path, dest_dir)

Extract a ZIP archive safely, preventing path traversal (zip-slip).

Validates that each member will extract under dest_dir before extraction.

Parameters:

zip_path (Path) – Path to the .zip archive.
dest_dir (Path) – Directory to extract into (created if missing).

Raises:

RuntimeError – If an unsafe member path is detected.
zipfile.BadZipFile – If the archive is invalid or corrupted.
OSError – If filesystem operations fail.