From SCADA to Maintenance Decision: The InverterAI Pipeline

The end-to-end shape

The job of InverterAI as a system is narrow: convert SCADA telemetry into a prioritised maintenance queue, with explanations. Everything else — the physics models, the neural network architecture, the user interface — exists in service of that single output.

That conversion happens through five stages, each with a clear input contract, a clear output contract, and a clear failure mode. The discipline of keeping the stages clean is what makes the system maintainable; the discipline of keeping the interfaces narrow is what makes it OEM-agnostic.

Stage 1 — Ingestion from heterogeneous SCADAs

Sungrow, Huawei, SMA, Ingeteam, Power Electronics, FIMER, Schneider, Ampt, GE and the rest of the Tier-1 OEMs each expose inverter telemetry through different SCADA stacks. The protocols normalise to a small set:

Modbus TCP/RTU — by far the most common in EU and Iberian fleets.
OPC UA — increasingly the default for newer commissioning, especially when the SCADA front-end is from a third party rather than the OEM.
IEC 61850 — common where the inverter sits behind a substation automation layer, especially in grid-supporting installations.
File-based historian exports — CSV, parquet or vendor-specific binary formats from PI, Wonderware, OSIsoft and similar. Pragmatic for retrofits and rapid pilots.

InverterAI normalises all four into a single canonical schema at ingestion. The canonical tag set is deliberately small — around 25-40 tags per inverter — and maps directly to the physics models downstream. Anything richer than that is captured but not required.

The ingestion layer is the only place that has to know about OEM-specific quirks (signed vs unsigned integers, scaling factors, register-map differences). Everything downstream sees the canonical schema only. This is what makes the rest of the pipeline OEM-agnostic in practice, not just in marketing.

Stage 2 — Cleansing and physical sanity checks

Real SCADA data is messier than tutorials acknowledge. Sensor dropouts, frozen values, clock drift, unit-conversion bugs, range-bound clipping and incomplete historian backfills all show up in the wild. Naive ML pipelines amplify these into spurious model outputs; physics-grounded pipelines reject them.

The cleansing stage applies four families of checks in sequence:

Range checks. Junction temperature cannot exceed device absolute maximum. DC-link voltage cannot exceed the rated maximum. AC current cannot exceed the rated value plus a small instrumentation tolerance. Violations trigger a flag, not a model input.
Physical consistency checks. AC output power cannot exceed DC input power. Junction temperature cannot be below ambient. Heatsink temperature cannot decrease faster than the thermal time constant of the heatsink mass allows.
Statistical health checks. Frozen-value detection on rolling windows. Step-change detection that flags anomalous jumps inconsistent with physical dynamics. Cross-tag correlation checks (heatsink T should correlate with power output at a known phase lag; if it does not, something is broken).
Clock-skew detection. Comparing inverter timestamps to a known reference (irradiance peak alignment to civil noon, irradiance ramp alignment to sunrise) detects drift before it corrupts cycle counting downstream.

Failing tags are masked rather than substituted. The downstream models see the gap, not a hallucinated value. This is critical: a black-box ML pipeline that interpolates over missing values will produce a confident answer derived partly from its own imagination. A physics-grounded pipeline that masks instead simply outputs higher uncertainty during the gap, which is the honest behaviour.

Stage 3 — Physics feature engineering

Raw SCADA tags are not what the physics models consume. The physics models consume physically-meaningful features derived from the tags. The transformation from one to the other is where decades of power-electronics reliability research are encoded.

The core features the InverterAI feature pipeline produces:

Estimated junction temperature per phase leg, via thermal-network inversion from heatsink temperature, ambient and instantaneous dissipation. The starting point is the OEM datasheet Foster network; the calibration uses fleet-observed dynamics to tighten the constants.
Rainflow cycle histograms on the Tj series, computed on rolling windows of hours and days, then aggregated to per-inverter-day fatigue increments feeding Coffin-Manson.
Estimated DC-link ripple-current trace from AC-side current harmonics and switching-frequency telemetry. Where direct DC-link current measurement exists, it shortcuts this estimation.
Capacitor core temperature estimate from heatsink temperature plus ripple-current self-heating, feeding the Arrhenius lifetime model.
Partial-load duty fraction and ambient delta over rolling windows, used as covariates in the PINN.
Event-density features — startup/shutdown counts, grid-support events, derate occurrences — that capture operating-regime context.

These features are the inputs the PINN actually consumes. Engineering them explicitly, rather than asking the network to learn them from raw tags, is the single largest lever on prediction quality in practice. The physics is in the features, and the features are in the loss.

Stage 4 — PINN inference and uncertainty

With clean features in hand, the PINN inference layer produces, for each inverter and each failure mode, a Remaining Useful Life (RUL) distribution rather than a point estimate. The 90% confidence band is part of the output, not a footnote on it.

Inference is cheap once the network is trained — milliseconds per inverter, which lets the entire fleet be re-evaluated on every fresh SCADA window. The expensive work is the training and calibration, which happens offline on accumulated fleet data and gets repeated when material new failure observations land.

The output for each inverter is a small structured object:

Per-failure-mode RUL distribution (median, 10%, 90% percentiles).
Dominant failure mode identifier.
Contributing-feature attribution — which operating conditions drove the prediction.
Confidence-quality flag indicating whether the underlying data is rich enough to trust the prediction.

This object travels to Stage 5 as-is. It is never collapsed to a single number; collapsing is the consumer's decision, not the model's.

Stage 5 — Decision, explanation and CMMS handoff

RUL by itself does not schedule maintenance. Real-world scheduling depends on at least four other variables that the model does not know about and should not pretend to:

Accessibility. Site distance from the crew depot, road conditions, gate access, security clearances.
Parts availability. What is on the shelf, what is on order, what is in OEM end-of-life support.
Criticality. Block topology, energy market exposure, contracted availability terms.
Weather windows. Wind, rain, ambient temperature limits for the specific intervention.

The decision layer combines the model output with these operational variables to produce a prioritised inverter list — the queue. Each item carries its RUL distribution, its failure-mode attribution and the scheduling rationale that put it where it sits.

The handoff to the existing CMMS is via API. InverterAI does not replace the CMMS — it feeds it. Work orders inherit the explanation as a structured note, so the technician arriving at the cabinet knows what the model expected to find and can confirm or refute it on the spot. Confirmed and refuted outcomes feed back into the calibration loop.

What the platform deliberately does not do

Two things the pipeline does not do, by design:

No control writes back to the SCADA system. InverterAI is read-only. It never issues commands, never adjusts setpoints, never participates in the control loop. This is what makes the deployment commissioning-risk-free — the worst-case behaviour of InverterAI is to be wrong about a prediction, not to destabilise an operating asset.
No proprietary OEM data contracts required. Standard SCADA tags are enough. The platform does not need firmware access, OEM-private registers, or cloud-to-cloud data sharing agreements with the inverter manufacturer.

These constraints are not laziness — they are deliberate design choices. They are what makes the platform deployable in a quarter rather than a year, and what makes it OEM-agnostic in practice rather than only in pitch decks.

Time-to-first-value

A new fleet typically goes from contract signature to usable RUL output in 60-90 days. The phases:

Weeks 1-2 — Connectivity. Securing the SCADA feed, validating network paths, establishing the canonical-schema mapping for the specific inverter families.
Weeks 3-6 — Data quality. Discovering and characterising the sensor faults, clock drifts and historian gaps specific to the fleet. This phase produces real value on its own — most operators learn things about their data quality that they did not previously know.
Weeks 6-10 — Calibration. Tuning the thermal-network constants and Coffin-Manson / Arrhenius parameters against fleet history and any documented past failures.
Weeks 10-12 — Operationalisation. CMMS integration, queue delivery into the existing O&M workflow, dashboard rollout, training.

From week 12 onwards, the platform delivers the steady-state predictive overlay. RUL confidence bands tighten over the subsequent quarters as more fleet data accumulates and the Bayesian update mechanism integrates confirmed failure outcomes.

What makes this pipeline robust

The five-stage discipline is what makes the pipeline work outside the lab. Each stage has bounded responsibilities, clear failure modes, and explicit interfaces. The physics carries through the whole pipeline as features, equations and constraints — not as a marketing layer applied on top of a black-box model.

Operationally, this matters because:

Failures in any stage are localised. A bad sensor in Stage 2 does not propagate into a confident wrong prediction in Stage 4; it propagates into masked data and widened uncertainty, which is the honest behaviour.
Adding a new inverter family touches Stage 1 (canonical mapping) and Stage 3 (thermal-network constants) but not Stages 4 and 5. The physics is the physics.
Calibration improvements compound. Every confirmed failure feeds back into the Bayesian update; the platform gets better at predicting that specific failure mode on that specific family, automatically.

See the platform in action → Read the full technical stack →