NERC - MGHPCC Power Loss is causing our system-wide unavailability of NERC services – Incident details

GETTING HELP
Email: help@nerc.mghpcc.org or, using the NERC's Support Ticketing System
NERC Documentation: https://nerc-project.github.io/nerc-docs/
Status page for the New England Research Cloud (NERC) and other resources.
Please scroll down to see details on any Incidents or maintenance notices.

MGHPCC Power Loss is causing our system-wide unavailability of NERC services

Monitoring
Major outage
Started 10 days ago

Affected

MGHPCC SHARED SERVICES (MGHPCC-SS) ACCOUNT PORTAL

Major outage from 1:00 PM to 12:20 AM, Partial outage from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

NERC COLDFRONT

Major outage from 1:00 PM to 12:20 AM, Operational from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

NERC OPENSTACK

Major outage from 1:00 PM to 12:20 AM, Partial outage from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

OPENSTACK DASHBOARD (HORIZON)

Major outage from 1:00 PM to 12:20 AM, Partial outage from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

NERC OPENSTACK COMPUTE SERVICE (NOVA)

Major outage from 1:00 PM to 12:20 AM, Partial outage from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

OPENSTACK NETWORKING SERVICE (NEUTRON)

Major outage from 1:00 PM to 12:20 AM, Partial outage from 12:20 AM to 4:56 PM, Major outage from 4:56 PM to 4:09 PM, Operational from 4:09 PM to 12:00 AM

Updates
  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Investigating
    Investigating

    Front-end shared services - most critically Keycloak, ColdFront, and RegApp - are currently unreachable. We are continuing to work on resolving this incident.

    You can continue to use the NERC OCP (OpenShift) Production and NERC EDU (Academic) clusters as normal to deploy your workloads.

  • Identified
    Identified
    We are continuing to work on a fix for this incident.
  • Investigating
    Investigating
    We are currently investigating this incident.