Dataset DescriptionThis dataset contains anonymized job-level records from the Eagle high-performance computing (HPC) system. Each record represents a Slurm batch job and includes scheduling metadata, resource requests, resource utilization, CPU and GPU energy consumption measurements, and computed efficiency metrics. Personally identifiable fields (user, account, and job name) have been replaced with cryptographic hashes. Energy metrics include both TDP-estimated CPU energy and measured node-level and GPU-level energy from iLO and Ganglia monitoring systems.Developed byNational Laboratory of the Rockies (NLR), ROR: https://ror.org/036266993Contributed byHPC Operations and Data Analytics teams at NLR.Dataset short descriptionAnonymized Slurm job records from the NLR Eagle HPC system, including job scheduling, resource allocation, CPU and GPU energy measurements, and efficiency metrics.Over what timeframe was the data collected or generated? Does this timeframe align with when the underlying phenomena or events occurred?The dataset covers the operational lifetime of the Eagle HPC system, with timestamps in Mountain Time zone. Slurm data was processed nightly after midnight, so the database was always current through the prior day. The collection timeframe aligns directly with the underlying job scheduling events as they occurred on the Eagle system.What resources were used?Facilities:Eagle HPC System, National Laboratory of the Rockies (NLR), ROR: https://ror.org/036266993Funding:U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy (EERE).Other Supporting Entities:N/ASharing/Access InformationReuse restrictions placed on the data:The dataset has been anonymized by hashing sensitive fields (user, account, and job name). Reuse is subject to the license specified in this datacard. Users should not attempt to re-identify individuals from hashed fields.Provide DOIs, and bibtex citations to publications that cite or use the data.N/AProvide DOIs, citations, or links to other publicly accessible locations of the data.N/AProvide DOIs, citations, or links and descriptions of relationships to ancillary data sets.This dataset is derived from the Eagle schema of the NLR HPC job database.Data & File OverviewList all files contained in the dataset.Format: File | Descriptionesif.hpc.eagle.job-anon.zip | Zipped Hive-partitioned Apache Parquet dataset containing anonymized job records from the Eagle Slurm scheduler. Each row is a parent job record with scheduling metadata, resource requests/usage, CPU and GPU energy measurements, and computed efficiency metrics.esif.hpc.eagle.job-anon-energy-metrics.zip | Zipped Hive-partitioned Apache Parquet dataset containing anonymized job records from the Eagle Slurm scheduler. Each row is a parent job record with scheduling metadata, resource requests/usage, CPU and GPU energy measurements, computed efficiency metrics, and addtional energy metrics calculated from iLO and Ganglia.datacard.md | This datacard file describing the dataset.Describe the relationship(s) between files.The Parquet dataset is the primary data file. The datacard provides documentation. In the source database, each job record may have associated job_step records (not included here) that contain finer-grained per-step resource usage data including TRESUsage fields.Describe any additional related data collected that was not included in the current data package.The source database contains additional tables not included in this extract including job_step (per-step resource usage including TRESUsage fields). Raw Slurm slurm_data JSONB fields have also been excluded.Are there multiple versions of this dataset?N/AMethodological InformationHow was the data for each instance obtained or generated?Each instance is a parent job record collected from the Slurm workload manager on the Eagle HPC system via the sacct command. The data represents real job submissions, scheduling decisions, and resource consumption. Calculated fields (efficiency metrics, energy measurements) are derived from the raw Slurm data through database triggers and batch functions. Energy data is enriched from two additional sources: node-level power from iLO (Integrated Lights-Out) monitoring, and GPU-level power from Ganglia monitoring.For each instrument, facility, or source used to generate and collect the data, what mechanisms or procedures were used?Slurm data was collected via the sacct command and ingested through the following pipeline: Eagle Jobs API → Redpanda message queue (hpc-eagle-job topic) → StreamSets on Snowy → HPCMON API → Sage PostgreSQL database. Slurm data was processed nightly after midnight. Node-level energy data was collected from iLO (HP Integrated Lights-Out) management interfaces. GPU energy data was collected from Ganglia monitoring. Both energy sources were joined to job records via node lists and time ranges.To create the final dataset, was any preprocessing/cleaning/labeling of raw data done?Yes. The following preprocessing was applied:Anonymization: The fields name, user, and account were replaced with cryptographic hashes to prevent re-identification.Column derivation: Several columns are calculated from raw Slurm fields, including queue_wait (start_time − submit_time), cpu_eff (TotalCPU / CPUTime), and max_mem_eff.State simplification: A state_simple column maps detailed Slurm states (e.g., "CANCELLED BY 12345") to simplified labels (e.g., "CANCELLED").QoS accounting: An accounting_qos column applies business rules: buy-in partitions are labeled "buy-in"; standby partitions are labeled "standby"; otherwise the Slurm QoS is used.Energy enrichment: CPU TDP-estimated energy is calculated from cpu_used, CPU TDP (200W for Intel Xeon Gold 6154), and core count (18 cores). Node-level measured energy is joined from iLO data. GPU-level measured energy is joined from Ganglia data.Timezone handling: Eagle's Slurm export did not include timezone offsets. Timezone-aware columns (submit_time_tz, start_time_tz, end_time_tz) were populated from the LEX accounting database which stores correct timezone information. The non-tz columns may be incorrect across daylight saving boundaries.Is the software that was used to preprocess/clean/label the data available?The data is loaded and processed using PostgreSQL functions. These are internal to the NLR HPC operations database and are not publicly released at this time.Describe any standards and calibration information, if appropriate.All timestamps are in Mountain Time zone. The non-timezone columns (submit_time, start_time, end_time) use the timestamp datatype without timezone and may be off by one hour across daylight saving boundaries. The timezone-aware columns (submit_time_tz, start_time_tz, end_time_tz) are sourced from LEX accounting data and correctly handle DST transitions. CPU TDP energy estimates use a fixed 200W TDP for the Intel Xeon Gold 6154 with 18 cores per CPU. Node-level energy is measured via iLO. GPU energy is measured via Ganglia.Describe the environmental and experimental conditions relevant to the dataset.The Eagle system was located at the NLR campus. Compute nodes used Intel Xeon Gold 6154 processors (18 cores, 200W TDP). Node configurations included standard compute, bigmem, GPU, DAV, and other specialized partitions. Available partitions: bigmem, bigmem-8600, bigscratch, csc, dav, ddn, debug, gpu, haswell, long, mono, short, standard. Available QoS levels: Unknown, normal, buy-in, debug, penalty, high, standby. Available job states: CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, TIMEOUT.Describe any quality-assurance procedures performed on the data.The data have been cleaned and validated through the standard data processes used to support Eagle operations. While these preprocessing and quality-control steps are integral to the dataset, the underlying software and pipelines are not publicly availableData-Specific InformationWhat data does each instance within the dataset consist of?Each instance (row) represents a single parent Slurm job on the Eagle system. The data includes raw Slurm scheduling fields (timestamps, resource requests, resource usage, state), anonymized identifiers, derived efficiency metrics, and measured energy consumption from both node-level (iLO) and GPU-level (Ganglia) monitoring.Number of variables:62Number of cases/rows:Approximately 13,800,000.Variable descriptions:Format: Variable Name | Description | Unit | Value Labels | Slurm sacct Field || id | Unique primary key (full job ID string) | N/A | | JobID || job_id | Numeric job ID in Slurm | N/A | | JobIDRaw || array_pos | Array index if job array, else null | N/A | | ArrayTaskID || name_hash | Anonymized hash of the job name | N/A | hash string | JobName || user_hash | Anonymized hash of the submitting user | N/A | hash string | User || account_hash | Anonymized hash of the allocation account | N/A | hash string | Account || partition | HPC queue/partition requested | N/A | bigmem, bigmem-8600, bigscratch, csc, dav, ddn, debug, gpu, haswell, long, mono, short, standard | Partition || state | Full Slurm job state string | N/A | e.g., COMPLETED, FAILED, PENDING, RUNNING, CANCELLED BY {uid} | State || state_simple | Simplified job state | N/A | CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, TIMEOUT | (derived from State) || submit_time | Timestamp when the job was submitted (no timezone) | timestamp | | Submit || start_time | Timestamp when the job started (null if PENDING, no timezone) | timestamp | | Start || end_time | Timestamp when the job ended (null if PENDING/RUNNING, no timezone) | timestamp | | End || submit_time_tz | Timezone-aware submit timestamp (from LEX data) | timestamptz | | (from LEX, not sacct) || start_time_tz | Timezone-aware start timestamp (from LEX data) | timestamptz | | (from LEX, not sacct) || end_time_tz | Timezone-aware end timestamp (from LEX data) | timestamptz | | (from LEX, not sacct) || nodes_req | Number of nodes requested | count | | ReqNodes || processors_req | Number of CPUs requested | count | | ReqCPUS || memory_req | Amount of memory requested | string | e.g., Slurm ReqMem format | ReqMem || nodes_used | Number of nodes utilized | count | | NNodes || processors_used | Number of CPUs utilized | count | | NCPUS || wallclock_req | Maximum wall time requested | interval | | Timelimit || wallclock_used | Wall time actually used | interval | | Elapsed || cpu_used | CPU time utilized | interval | | TotalCPU || cpu_energy_tdp_estimated_max_watt_hours | Estimated max CPU energy based on TDP (cpu_used / 3600 × 200W TDP / 18 cores) | Wh | | (derived from TotalCPU) || cpu_energy_tdp_estimated_used_watt_hours | Estimated used CPU energy based on TDP (cpu_used / 3600 × cpu_eff × 200W TDP / 18 cores) | Wh | | (derived from TotalCPU, CPUTime) || cpu_energy_tdp_watts | TDP of the CPU (Intel Xeon Gold 6154) | W | 200 | (hardware constant) || cpu_energy_num_cores | Number of cores in the CPU (Intel Xeon Gold 6154) | count | 18 | (hardware constant) || node_energy_total_watt_hours | Total watt-hours consumed by the node (from iLO) | Wh | | (from iLO, not sacct) || node_energy_node_array | Array of nodes contributing to energy calculation | N/A | | (from iLO, not sacct) || node_energy_avg_watts_array | Array of average watts per node | W | | (from iLO, not sacct) || node_energy_watt_hours_array | Array of watt-hours per node | Wh | | (from iLO, not sacct) || node_energy_wallclock_hours_array | Array of wallclock hours per node | hours | | (from iLO, not sacct) || gpu0_energy_total_watt_hours | Total watt-hours consumed by GPU 0 (from Ganglia) | Wh | | (from Ganglia, not sacct) || gpu1_energy_total_watt_hours | Total watt-hours consumed by GPU 1 (from Ganglia) | Wh | | (from Ganglia, not sacct) || gpu_energy_node_array | Array of nodes for GPU energy data | N/A | | (from Ganglia, not sacct) || gpu_energy_gpu_array | Array of GPU identifiers | N/A | | (from Ganglia, not sacct) || gpu_energy_avg_watts_array | Array of average watts per GPU | W | | (from Ganglia, not sacct) || gpu_energy_watt_hours_array | Array of watt-hours per GPU | Wh | | (from Ganglia, not sacct) || gpu_energy_wallclock_hours_array | Array of wallclock hours per GPU | hours | | (from Ganglia, not sacct) || gpu_energy_timeseries_timestamp_array | Array of timestamps for GPU power time series | timestamptz | | (from Ganglia, not sacct) || gpu_energy_timeseries_node_array | Array of node names for GPU power time series | N/A | | (from Ganglia, not sacct) || gpu_energy_timeseries_gpu_array | Array of GPU identifiers for power time series | N/A | | (from Ganglia, not sacct) || gpu_energy_timeseries_watts_array | Array of instantaneous GPU watts for power time series | W | | (from Ganglia, not sacct) || nodelist | Array of node names used (empty if PENDING) | N/A | Eagle node names (e.g., r103u01) | NodeList || qos | Quality of Service of the job | N/A | Unknown, normal, buy-in, debug, penalty, high, standby | QOS || queue_wait | Time the job waited in queue | interval | start_time − submit_time | (derived from Start − Submit) || cpu_eff | CPU efficiency (TotalCPU / CPUTime) | ratio (0–1) | | (derived from TotalCPU / CPUTime) || max_mem_eff | Max memory efficiency: max memory used / memory available, across all job steps | ratio (0–1) | | (derived from MaxRSS / ReqMem) || gpus_requested | Number of GPUs requested (GREATEST of ReqGRES and ReqTRES) | count | | ReqGRES, ReqTRES || gpu_nodes_occupied | Number of Eagle GPU nodes occupied during this job | count | | (derived) || calc_cols_updated_on | Timestamp of last update of calculated fields | timestamp | | (internal metadata) || created_on | Timestamp that record was created in the database | timestamp | | (internal metadata) || updated_on | Timestamp that record was last updated in the database | timestamp | | (internal metadata) |Codes used for missing data:Format: Code | Description(empty/null) | Field not applicable for the job's current state (e.g., start_time is null for PENDING jobs, end_time is null for PENDING and RUNNING jobs, energy fields null for jobs without energy data)0 | Zero values in numeric fields (e.g., processors_used = 0 for PENDING jobs, energy = 0 for jobs that have not run)Specialized formats or other abbreviations used:File format: Zipped Hive-partitioned Apache Parquet dataset (esif.hpc.eagle.job-anon.zip). Can be read with tools such as Apache Spark, PyArrow, pandas, DuckDB, or any Parquet-compatible reader.Timestamps: Mountain Time zone. Non-tz columns use timestamp without timezone and may be off by one hour across daylight saving boundaries. The _tz columns use timestamptz and are correct across DST transitions.Intervals: Slurm format DD-HH:MM:SS, converted to PostgreSQL interval format.Node names: Eagle naming convention, e.g., r103u01, r5i2n8.Nodelist: PostgreSQL array format, e.g., {r103u01} or {} for empty.Memory request: Slurm ReqMem string format.Energy arrays: Several energy fields use PostgreSQL array types (varchar[], real[]) to store per-node or per-GPU breakdowns of energy measurements.Hash fields: Cryptographic hash strings representing anonymized values.Sample of the dataset:The dataset contains job records spanning the operational lifetime of the Eagle system across states CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, and TIMEOUT, and across partitions including bigmem, bigscratch, csc, dav, debug, gpu, haswell, long, short, and standard.More informationNote on timestamps and daylight saving: Eagle's Slurm export did NOT include timezone offsets. The non-tz timestamp columns (submit_time, start_time, end_time) may be incorrect when a job crosses a daylight saving boundary. The _tz columns (submit_time_tz, start_time_tz, end_time_tz) are sourced from the LEX accounting database which correctly stores timezone information. Use the _tz columns when computing time differences across DST boundaries.Note on energy data: Energy data comes from three sources. CPU TDP estimates are calculated from the fixed 200W TDP of the Intel Xeon Gold 6154 (18 cores). Node-level measured energy comes from iLO (HP Integrated Lights-Out) management interfaces. GPU-level measured energy comes from Ganglia monitoring. Not all jobs have measured energy data; availability depends on the monitoring infrastructure being active during the job's execution.Summary: The Eagle HPC operated at NLR from 2019 through 2024. Eagle was a 2,000-node, 8-petaflop system. The primary dataset includes a complete accounting record of 13M+ jobs from the Eagle supercomputer and an additional dataset with the same information plus additional metrics on energy consumption. The data are sufficiently anonymized and do not include sensitive user or project data. Data provided in compressed Hive dataset/Parquet format.