energy_fault_detector.utils.data_downloads
- download_file(session, url, dest)
Download a single file to disk using streaming.
- download_zenodo_data(identifier='10.5281/zenodo.15846963', dest='./downloads', remove_zip=True, overwrite=False, flatten_file_structure=True, expected_file_types='*.csv')
Download a Zenodo record via API and unzip any .zip files.
Downloads all files associated with a given Zenodo record (by ID, DOI, or URL), saves them to a local directory, and optionally flattens nested directories that result from extracting ZIP archives.
- Parameters:
identifier (str) – Zenodo record ID, DOI (e.g., 10.5281/zenodo.15846963), or record URL. Defaults to the CARE2Compare dataset.
dest (Path) – Local output directory to save downloaded files. (default: downloads)
remove_zip (bool) – If True, ZIP archives will be removed after extraction.
overwrite (bool) – If True and dest already exists, contents of dest will be overwritten. Default is False.
flatten_file_structure (bool) – If True and unzipping results in a single top-level folder with no conflicting root-level files matching expected_file_types, moves its contents up one level. Default is True.
expected_file_types (Union[List[str], str]) – Glob pattern(s) used to detect existing relevant files at the root. If any match, flattening is skipped. Can be a string like ‘.csv’ or list like [’.csv’, ‘.json’]. Default is ‘.csv’.
- Returns:
The absolute path to the directory containing the downloaded and unzipped data.
- Return type:
Path
- fetch_record(session, record_id)
Fetch record metadata from Zenodo’s REST API.
- list_files(session, record_json)
Return a list of file descriptors for a Zenodo record.
Supports both embedded ‘files’ in the record JSON and the ‘links.files’ endpoint (newer API).
- Parameters:
session (
Session) – A requests.Session to use for any follow-up call.record_json (
dict) – Record JSON as returned by fetch_record().
- Return type:
- Returns:
A list of file dicts with at least ‘links’ and ‘key’/’filename’.
- Raises:
RuntimeError – If no files are found.
requests.HTTPError – If loading the files listing endpoint fails.
- parse_record_id(identifier)
Extract a Zenodo record ID from an ID, DOI, or URL.
- Accepts:
Numeric ID (e.g., “15846963”)
DOI (e.g., “10.5281/zenodo.15846963”)
Record URL (e.g., “https://zenodo.org/records/15846963”)
- Parameters:
identifier (
str) – Input string containing an ID, DOI, or URL.- Return type:
- Returns:
The numeric record ID as a string.
- Raises:
ValueError – If a record ID cannot be parsed from the input.
- prepare_output_dir(out_dir, overwrite)
Ensure the output directory is ready.
If the directory exists and overwrite is True, its contents are removed and the directory is recreated empty. If it exists and overwrite is False, it is left as is. If it does not exist, it is created.
- Parameters:
- Raises:
OSError – If filesystem operations fail.
RuntimeError – If out_dir points to an unsafe path to remove.
- Return type:
- recursive_safe_extract(zip_path, dest_dir, remove_archives=True)
Recursively extracts ZIP files, including those found inside other ZIPs.
- safe_extract_zip(zip_path, dest_dir)
Extract a ZIP archive safely, preventing path traversal (zip-slip).
Validates that each member will extract under dest_dir before extraction.
- Parameters:
- Raises:
RuntimeError – If an unsafe member path is detected.
zipfile.BadZipFile – If the archive is invalid or corrupted.
OSError – If filesystem operations fail.