Case study

AI Data Center Grid Connection Case Study

Case study of an AI data center grid connection covering large electrical load, substation capacity, power quality, cooling demand, staged energization, resilience, and operations.

This case study follows a realistic engineering scenario: connecting a large AI training data center to a constrained regional power grid. It is not a specific facility. The scenario is useful because high-density computing changes the shape of the electrical, thermal, civil, telecommunications, and operations problem at the same time.

The proposed facility is planned in three phases. Phase 1 brings 40 MW of IT load online. Phase 2 increases the IT load to 90 MW. Phase 3 reserves space and utility capacity for 150 MW of IT load. The site is attractive because it has fiber routes, land availability, and access to a transmission corridor, but the local distribution network was not built for a single load of this size.

The central engineering question is:

How can a high-density AI data center be connected without weakening grid reliability, facility resilience, cooling performance, or future expansion options?

The answer is not a single transformer rating. The connection has to be treated as a controlled interface between an evolving computing load, a thermal plant, protection systems, utility planning rules, and operational procedures. The useful design object is therefore a verified load envelope: how much power the site may import, how fast it may change, what power quality it must maintain, and what it must do during degraded grid or facility states.

Initial Request

The project team begins with a load request based on IT capacity. Early estimates assume a power usage effectiveness near 1.2 at mature operation. At 90 MW of IT load, this implies a facility load near:

P_{facility}=PUE \cdot P_{IT}=1.2(90\ \text{MW})=108\ \text{MW}

This simple estimate is useful, but it is not enough for interconnection. The utility and design team need to know:

  • maximum coincident demand;
  • ramp rate during workload changes;
  • inrush and energization sequence;
  • power factor and reactive power range;
  • harmonic emissions from UPS systems, drives, and power supplies;
  • short-circuit contribution and protection behavior;
  • redundancy and maintenance states;
  • backup generation operation;
  • flexibility or curtailment capability;
  • cooling plant electrical demand during extreme weather.

The first lesson appears early: an AI data center cannot be treated as only a flat megawatt block. The shape, timing, power quality, and contingency behavior of the load matter.

The owner converts the business phasing plan into an engineering load table before submitting the detailed connection package:

PhaseIT loadPlanning PUEEstimated facility demandConnection implication
140 MW1.2550 MWDedicated feeders and substation reinforcement may be sufficient.
290 MW1.20108 MWA high-voltage service, transformer expansion, and full protection study are required.
3150 MW1.18177 MWTransmission-level upgrades or firm flexibility obligations may be unavoidable.

These values are not procurement ratings. They are study inputs that must be corrected for ambient conditions, redundancy states, battery charging, cooling-plant operating mode, future rack density, and the difference between average and coincident peak load.

Interconnection Study Basis

The utility requires the project team to define the site as a dynamic load with explicit envelopes. A connection study for this case includes:

  • real-power import limits for each phase and each operating state;
  • reactive-power capability at the point of interconnection;
  • maximum ramp rate during workload dispatch, cooling startup, and recovery after outage;
  • transformer energization sequence and inrush assumptions;
  • harmonic-current spectrum for UPS systems, server power supplies, variable-speed drives, and battery converters;
  • short-circuit levels and breaker interrupting duties before and after expansion;
  • N-1 loading for utility supply paths and site transformers;
  • under-voltage, over-voltage, frequency, and transfer behavior;
  • telemetry, metering, and control requirements for utility operations;
  • emergency operating rules for load shedding, battery discharge, and backup generation.

For engineers, the important distinction is between installed capacity and admissible operation. A site may own enough equipment to serve a large load, but the interconnection agreement may still restrict ramp rate, reactive exchange, backup-generation export, or operation during grid constraints.

The study converts those restrictions into an operating envelope:

Envelope itemEngineering controlEvidence required
Maximum importSite demand limiter, metering, and operator alarm.Point-of-interconnection demand trend below approved limit.
Ramp rateWorkload scheduler, UPS charging control, and cooling startup sequence.Metered ramp test during staged load increase and recovery.
Reactive powerInverter and power-factor correction settings.Power factor or reactive exchange inside agreed band.
Harmonic distortionConverter specifications, filters, and measurement points.Harmonic survey under representative IT and cooling load.
Backup operationInterlocks preventing unapproved export or unsafe islanding.Transfer tests and protection records.
Flexibility blockDispatch rule tied to workload, thermal, and battery state.Verified response duration, rebound limit, and telemetry record.

This table becomes more useful than a single nameplate capacity because it states what the facility is allowed to do, how that behavior is controlled, and how it will be proven.

Site and Grid Constraints

The nearest substation has spare capacity on paper, but several constraints appear during study. A nearby industrial feeder already creates seasonal peaks. The transmission corridor has planned maintenance windows. Short-circuit levels at the proposed point of interconnection require switchgear review. The region also has summer cooling peaks that coincide with the data center’s highest cooling demand.

The utility requests staged connection studies. Phase 1 can be served with reinforcement to the existing substation and dedicated feeders. Phase 2 requires a new high-voltage service and transformer capacity. Phase 3 may require transmission-level upgrades or onsite flexibility to avoid unacceptable contingency loading.

The data center team also discovers that grid capacity and schedule are coupled. Servers can be ordered faster than substations, transformers, protection studies, permits, and transmission upgrades can be completed. The grid connection becomes a project-critical path item rather than a late utility detail.

The schedule review exposes a practical risk: if Phase 2 IT procurement is committed before utility upgrade gates are closed, the project can end up with stranded compute hardware, temporary operating restrictions, or expensive interim generation. The connection plan therefore becomes part of the investment decision, not just a facilities workstream.

Electrical Architecture

The facility concept uses a high-voltage utility interconnection feeding site substations. Medium-voltage distribution supplies data halls, UPS systems, cooling plant, pumps, network rooms, and support buildings. Low-voltage distribution reaches power distribution units and rack systems.

For balanced three-phase power:

P=\sqrt{3}V_L I_L PF

At this scale, voltage level strongly affects current, conductor size, losses, switchgear ratings, and protection design. Higher-voltage distribution can reduce current and losses, but it also changes equipment, safety procedures, arc-flash analysis, and maintenance requirements.

The electrical design must preserve maintainability. A design that can serve the load only when all equipment is in normal configuration is too fragile. The team defines load blocks, maintenance bypass paths, spare breaker positions, metering points, and isolation points before procurement.

The control interface is also specified before procurement. The site cannot rely on informal operator action to satisfy a grid constraint. Import limiting, battery dispatch, UPS recharge limits, cooling restart sequencing, and workload throttling need defined control ownership, alarm priority, manual override rules, and cybersecurity review. A control that exists in a building-management screen but is not integrated with electrical metering, workload scheduling, and operator authority is not a dependable grid-control function.

Worked Apparent-Power and N-1 Check

At Phase 2, the estimated facility demand is 108 MW. If the site must hold a minimum power factor of 0.95 at the point of interconnection, the apparent-power requirement is:

\displaystyle S=\frac{P}{PF}=\frac{108\ \text{MW}}{0.95}=113.7\ \text{MVA}

At a 132 kV three-phase interconnection, the corresponding line current is:

\displaystyle I_L=\frac{S}{\sqrt{3}V_L}=\frac{113.7\times 10^6}{\sqrt{3}(132\times 10^3)}=497\ \text{A}

At 33 kV, the same apparent power would require about:

\displaystyle I_L=\frac{113.7\times 10^6}{\sqrt{3}(33\times 10^3)}=1989\ \text{A}

This comparison explains why voltage level is not a cosmetic choice. Lower-voltage service increases current, losses, conductor requirements, switchgear duty, and the difficulty of maintaining selective protection.

The transformer redundancy check is just as important. Suppose an early concept uses two 80 MVA transformers. In normal operation, the total nameplate capacity is 160 MVA, which appears sufficient. Under an N-1 transformer outage, however, the remaining capacity is only 80 MVA:

S_{N-1}=80\ \text{MVA}<113.7\ \text{MVA}

The design therefore cannot claim full Phase 2 service continuity after one transformer outage. The team has three defensible options:

  1. install additional transformer capacity, such as a three-transformer arrangement where one unit can be unavailable;
  2. accept a defined load-shed state with contractual and thermal consequences;
  3. split Phase 2 into a smaller firm block and a flexible block that is energized only when the grid and site configuration allow it.

This is a planning decision, not just an equipment decision. It affects customer commitments, workload scheduling, cooling redundancy, maintenance windows, and the value of onsite storage.

UPS, Ride-Through, and Backup Strategy

The facility needs ride-through for utility disturbances and controlled transition to backup power. UPS systems protect IT equipment from short interruptions, voltage sags, and transfer events. Backup generation or alternative backup supply supports longer outages and selected cooling loads.

The team separates three categories of load:

  1. critical IT and network loads requiring continuous power;
  2. cooling loads needed to keep IT equipment within thermal limits;
  3. support loads that can ride through, shed, or restart in sequence.

This distinction prevents oversizing every system as if it had the same criticality. It also exposes a thermal problem: IT equipment can continue producing heat during a power disturbance while cooling systems transition. UPS autonomy is therefore linked to cooling ride-through, not only server uptime.

The project defines a minimum ride-through narrative for every credible disturbance. For example, a short voltage sag should be absorbed by UPS systems without IT interruption; a feeder outage should transfer selected blocks without losing cooling control; a longer utility outage should move the facility into a declared operating state with protected IT load, reduced noncritical load, and verified thermal margins. Without this sequence, autonomy minutes on a battery datasheet do not prove resilience.

Power Quality and Harmonics

The data center includes many power-electronic loads: server power supplies, UPS rectifiers and inverters, variable-speed drives, battery systems, and cooling equipment. These devices can affect harmonic distortion, power factor, voltage regulation, and protection coordination.

The project team performs harmonic studies for each phase. Filters and equipment specifications are adjusted before procurement. Metering is placed so that harmonic levels can be measured at the utility interface, site substations, UPS input, and major cooling plant feeders.

Power quality is treated as an operating requirement, not a one-time calculation. If future IT hardware changes the load spectrum, the site needs measurement evidence and spare mitigation capacity. Otherwise, a facility that passed initial studies may create problems after expansion.

The engineering control is to reserve margin in both the study and the physical system. Spare filter positions, harmonic metering, firmware-change governance for converters, and acceptance testing under representative IT loads are cheaper than discovering after occupancy that a new accelerator generation has changed the harmonic profile.

Cooling Demand Coupled to Grid Demand

The cooling plant is a major electrical load. During hot weather, chiller, pump, fan, and cooling tower or dry-cooler loads rise. This can coincide with regional grid peaks. The team evaluates air cooling, liquid cooling, economizer operation, chilled-water storage, and higher-temperature liquid loops.

The cooling calculation begins with the fact that nearly all IT power becomes heat. A simplified cooling load is:

\dot{Q}_{cooling}\approx P_{IT}+P_{aux}+P_{losses}-\dot{Q}_{recovered}

For high-density AI racks, liquid cooling reduces some room airflow requirements and can allow warmer heat rejection. However, it also adds pumps, coolant distribution units, leak monitoring, and controls. The final design uses a hybrid approach: direct liquid cooling for accelerator racks, air cooling for residual heat, and staged heat rejection sized for degraded operation.

Cooling and grid planning are reviewed together. A facility demand-response promise is not credible if the cooling system cannot reduce load safely during peak heat.

The thermal model is therefore linked to grid operating cases. A peak-grid event at mild ambient conditions may allow temporary cooling-power reduction. A peak-grid event during high ambient temperature may require a different response: workload curtailment, battery support, chilled-water storage discharge, or a smaller committed flexibility block. The load-reduction product must be constrained by junction temperature, coolant supply temperature, room dew point, and recovery time.

Staged Energization

The project uses staged energization instead of bringing all equipment online at once. Staging reduces risk to the grid and gives the facility team time to validate assumptions.

The energization sequence includes:

  • utility feeder and protection checks;
  • transformer energization and inrush review;
  • medium-voltage distribution tests;
  • UPS commissioning;
  • cooling plant startup;
  • load-bank testing;
  • first IT row energization;
  • rack-density ramp;
  • integrated systems testing;
  • utility-interface monitoring during each step.

Staged energization also helps distinguish design errors from operating errors. If voltage fluctuation, harmonic distortion, thermal instability, or control oscillation appears during a step, the team can stop before the full load is connected.

For Phase 1, the team defines hold points at 25%, 50%, 75%, and 100% of the approved load block. Each hold point requires a signed comparison between measured demand, power factor, harmonic distortion, cooling response, and the predicted envelope. The next step is not authorized because a checklist is complete; it is authorized because measured behavior remains inside the connection study assumptions.

Demand Flexibility

The utility asks whether the data center can provide demand flexibility. The answer depends on workload type. Some AI training jobs can shift in time or reduce power temporarily. Some inference or customer-facing workloads cannot be interrupted without service consequences.

The facility team defines a hierarchy:

  1. noncritical batch workloads that can be delayed;
  2. training workloads that can checkpoint before power reduction;
  3. cooling setpoint adjustments within thermal margin;
  4. battery discharge for short peak support;
  5. emergency load shedding as a last resort.

This hierarchy is important because flexibility is not only an electrical feature. It requires software scheduling, thermal margin, battery state of charge, service-level agreements, and operator authority. A demand-response contract that ignores these constraints can create reliability risk.

The team separates theoretical flexibility from verified flexibility. A verified flexibility block has a baseline, a response time, a minimum duration, telemetry, rebound rules, and a list of operating conditions under which it is unavailable. A simple metering expression is:

P_{flex}(t)=P_{baseline}(t)-P_{metered}(t)-P_{rebound,allocated}(t)

This prevents over-crediting a short load reduction that is followed by an uncontrolled recovery peak. For the data center, flexibility is accepted only when the workload scheduler, UPS or battery controller, cooling plant, and operator procedure can execute the same response during an integrated test.

The project therefore maintains a flexibility availability matrix:

  • available at mild ambient temperature with noncritical training workloads;
  • partially available when chilled-water storage or battery state of charge is below target;
  • unavailable during cooling-plant fault recovery;
  • unavailable during utility restoration until UPS recharge limits are satisfied;
  • unavailable during customer workloads with strict service-level commitments.

This prevents a commercial flexibility promise from becoming an engineering hazard during the exact conditions when the grid is stressed.

Failure Mode Review

The project team runs failure-mode reviews across utility, electrical, cooling, network, and operations systems. Several common-cause risks are identified:

  • two redundant feeders pass through the same construction area;
  • cooling controls and electrical monitoring share one network path;
  • backup generators depend on one fuel delivery route;
  • liquid-cooling manifolds for adjacent high-density racks share one isolation valve;
  • planned maintenance can leave the facility less redundant than assumed;
  • a grid event during high ambient temperature creates both power and thermal stress.

The team updates the design to separate routing, add monitoring independence, define maintenance states, and clarify load-shedding rules. The case shows why redundancy diagrams are not enough. Physical routing, controls dependencies, and operating states must be reviewed.

The review also changes documentation. Single-line diagrams are supplemented with route drawings, controls network diagrams, fuel logistics assumptions, maintenance-state matrices, and alarm-response procedures. This matters because a facility can be electrically redundant on paper and still be operationally fragile if redundant paths share a room, trench, communication switch, spare part, or maintenance crew.

Commissioning Evidence

Commissioning focuses on integrated evidence. The team does not accept equipment startup forms as proof that the facility is ready for service.

Required evidence includes:

  1. utility-interface metering under staged load;
  2. transformer and switchgear thermal checks;
  3. UPS transfer and ride-through tests;
  4. backup generation load acceptance and return-to-normal tests;
  5. cooling response during step changes in IT load;
  6. harmonic and power-factor measurements;
  7. liquid-cooling leak detection and pump failover tests;
  8. network-path and monitoring-system failover;
  9. demand-response simulation with workload and thermal constraints;
  10. operator procedures for degraded electrical and cooling states.

The commissioning record becomes part of the operating baseline. Later expansion must be compared against the measured behavior of earlier phases, not only against the original model.

The acceptance criteria are written before testing begins. For this case, Phase 1 cannot enter steady commercial operation until the project has demonstrated:

  • measured facility demand within the approved import envelope at each load hold point;
  • point-of-interconnection power factor inside the utility-agreed operating band;
  • harmonic measurements below the applicable interconnection limits under representative IT and cooling load;
  • successful UPS ride-through and transfer tests without uncontrolled IT or cooling shutdown;
  • cooling recovery after a step change in IT load without exceeding thermal limits;
  • validated load-shed and demand-flexibility sequences with no unacceptable rebound peak;
  • protection-trip and alarm evidence traceable to the latest approved settings;
  • operator logs showing that degraded-state procedures are executable by the site team.

These criteria make the commissioning package useful for future expansion. When Phase 2 begins, the project team can compare new simulations with measured Phase 1 behavior instead of restarting from assumptions.

Outcome

Phase 1 proceeds after substation reinforcement, metering upgrades, and integrated systems testing. Phase 2 is tied to a new high-voltage service, additional transformer capacity, and confirmed cooling-plant expansion. Phase 3 remains conditional on transmission upgrades, demand-flexibility agreements, and updated grid studies.

The project does not solve every constraint by building more equipment. It uses staged capacity, clear operating envelopes, utility coordination, power-quality monitoring, thermal validation, and workload flexibility. The most important design decision is to treat grid connection as a system interface rather than a utility formality.

Transfer Lessons

This case study gives several durable lessons:

  • large data centers should be studied as dynamic electrical loads, not only static MW requests;
  • cooling demand and grid demand are coupled, especially during hot weather;
  • UPS autonomy must be coordinated with thermal ride-through;
  • power quality should be monitored through expansion, not only calculated at design time;
  • staged energization reduces risk and creates evidence;
  • demand flexibility requires coordination between electrical systems, thermal systems, software, and contracts;
  • redundancy must be checked for common-cause failures and maintenance states.

The engineering significance is that AI infrastructure stresses interfaces. The facility connects chips to racks, racks to cooling loops, cooling loops to power systems, power systems to the grid, and workloads to operating rules. A strong design manages those interfaces explicitly instead of assuming that each discipline can solve its part in isolation.

REF

See also