This case study describes a real engagement. Client identity, proprietary details, and specific metrics are anonymized or approximated under NDA.
Industrial Energy Consumption Analytics
No granular visibility into energy usage across 12 manufacturing units. Monthly utility bills were the only data source for energy management decisions. Equipment inefficiency and abnormal consumption events went undetected until billing cycles revealed aggregate anomalies.
IoT sensor integration pipeline for real-time energy metering data, operational monitoring dashboard, and anomaly detection for equipment efficiency optimization. Provides per-unit, per-line, and per-equipment visibility at 1-minute granularity.
This engagement built the full IoT data stack for a multi-site manufacturing energy management use case. Energy meters across 12 production facilities report to AWS IoT Core via MQTT at 1-minute intervals. A time-series ingestion pipeline writes normalized readings to InfluxDB. Grafana dashboards serve three user types: plant managers (unit-level consumption and efficiency metrics), maintenance teams (equipment-level anomaly alerts), and the central energy management function (cross-facility comparison and aggregate reporting). Anomaly detection runs on a rolling window basis and alerts maintenance teams when individual equipment consumption deviates from its baseline consumption profile by more than a configurable threshold.
The Challenge
Industrial energy metering data has several characteristics that make it different from typical IoT telemetry. Consumption patterns are equipment-specific — a press line has a different baseline and variation profile than an HVAC unit — so anomaly detection cannot use a single global threshold. Readings are affected by planned production schedule changes (a unit running at 60% capacity will have lower consumption than at 100%, which is expected, not anomalous), meaning the anomaly detection needed production schedule context to avoid false alarms during planned downtime or reduced-capacity runs. Sensor reliability varied across the 12 facilities: older meters had higher rates of null readings, stuck values, and occasional negative readings from meter rollover. The ingestion pipeline had to handle all of these without corrupting the time-series database or triggering false anomaly alerts.
How We Built It
Sensor audit and MQTT ingestion pipeline (Weeks 1–3): We audited the 340 energy meters across the 12 facilities, categorizing them by meter type, communication protocol, and historical reliability. MQTT topic structure was standardized across all facilities, with each meter identified by facility ID, production zone, and equipment ID. The AWS IoT Core ingestion pipeline validates each message (range checks, stuck-value detection, rollover correction), applies per-meter calibration factors where provided by the facilities team, and writes clean readings to InfluxDB. Invalid readings are logged to a separate PostgreSQL table for meter reliability tracking rather than being discarded silently.
InfluxDB schema design and data retention (Weeks 4–5): We designed the InfluxDB schema to support the three query patterns the dashboards require: real-time current readings (1-minute granularity, 7-day retention at full resolution), historical trend analysis (hourly aggregates, 2-year retention), and anomaly detection windows (15-minute aggregates, 90-day retention). Continuous queries in InfluxDB compute the hourly and 15-minute aggregations automatically as raw data ages, controlling storage growth without data loss for the aggregate series. Equipment baseline consumption profiles are computed weekly and stored in PostgreSQL for the anomaly detection service to reference.
Anomaly detection engine (Weeks 6–8): The anomaly detection service runs in Go, consuming the 15-minute aggregate stream from InfluxDB and comparing each reading against the equipment's baseline profile adjusted for current production schedule context. Baseline profiles are segmented by production mode (full capacity, reduced capacity, planned downtime) using production schedule data from the MES integration. Anomaly scoring uses a z-score approach on the rolling 90-day baseline per equipment per production mode. Alerts fire when the anomaly score exceeds the configured threshold for two consecutive 15-minute windows — a debouncing mechanism that reduces false alarms from transient spikes without significantly delaying detection of genuine equipment issues.
Grafana dashboards and alert routing (Weeks 9–10): Three Grafana dashboard sets were built: a plant overview for facility managers (unit-level consumption heatmap, daily consumption vs. target, month-to-date cost estimate), an equipment detail view for maintenance teams (per-equipment consumption timeline, anomaly history, maintenance action log), and a cross-facility comparison view for the central energy management function. Alert routing sends anomaly alerts to the responsible maintenance team for each equipment unit via webhook integration with the existing maintenance management system. Alert fatigue was addressed by applying the two-window debounce and by configuring per-equipment thresholds calibrated on 90 days of historical data rather than applying uniform thresholds across all equipment.
What We Delivered
Energy cost reduction of 19% was measured in the 10 weeks following full deployment across the 12 facilities, compared to the equivalent period from the prior year adjusted for production volume. The reduction is attributed to two categories of intervention enabled by the monitoring system: maintenance-driven corrections (equipment running inefficiently identified through anomaly alerts and rectified) and behavioral changes (plant managers responding to real-time consumption visibility by adjusting scheduling and operational practices).
Sensor data uptime reached 99.1% across the 340 meters in the post-deployment measurement period. The per-meter reliability tracking built into the ingestion pipeline identified 14 meters with elevated null-reading rates, which the facilities maintenance team replaced. Average alert latency from anomaly onset to maintenance team notification is 4.2 seconds — the two-window debounce adds approximately 30 minutes of detection lag for gradual anomalies but eliminates the false alarm volume that would have caused the maintenance teams to mute alerts.
The cross-facility consumption data has enabled the energy management function to perform comparisons that were previously impossible. Three facilities have been identified as systematic outliers on energy-per-unit-output metrics, and root cause investigations at two of those facilities have identified specific equipment configurations that were not optimal. These findings are being applied as retrofits with projected additional efficiency gains beyond the immediate deployment period.
Ready to build something like this?
Tell us what you are building. We will scope it, price it honestly, and give you a clear plan.
Start a ConversationFree 30-minute scoping call. No obligation.