NERC - Notice history

GETTING HELP
Email: help@nerc.mghpcc.org or, using the NERC's Support Ticketing System
NERC Documentation: https://nerc-project.github.io/nerc-docs/
Status page for the New England Research Cloud (NERC) and other resources.
Please scroll down to see details on any Incidents or maintenance notices.

MGHPCC SHARED SERVICES (MGHPCC-SS) ACCOUNT PORTAL - Operational

98% - uptime
Aug 2023 · 100.0%Sep · 99.84%Oct · 93.48%
Aug 2023
Sep 2023
Oct 2023

NERC COLDFRONT - Operational

98% - uptime
Aug 2023 · 100.0%Sep · 99.83%Oct · 93.46%
Aug 2023
Sep 2023
Oct 2023

NETWORKING - Operational

100% - uptime
Aug 2023 · 100.0%Sep · 100.0%Oct · 100.0%
Aug 2023
Sep 2023
Oct 2023

STORAGE - Operational

100% - uptime
Aug 2023 · 100.0%Sep · 100.0%Oct · 100.0%
Aug 2023
Sep 2023
Oct 2023
100% - uptime

NERC WEBSITE - Operational

100% - uptime
Aug 2023 · 100.0%Sep · 100.0%Oct · 100.0%
Aug 2023
Sep 2023
Oct 2023

NERC DOCUMENTATION WEBSITE - Operational

100% - uptime
Aug 2023 · 100.0%Sep · 100.0%Oct · 100.0%
Aug 2023
Sep 2023
Oct 2023

NERC TICKETING SYSTEM - Operational

100% - uptime
Aug 2023 · 100.0%Sep · 100.0%Oct · 99.99%
Aug 2023
Sep 2023
Oct 2023

Notice history

Oct 2023

NERC OpenStack and OpenShift services are down!
  • Resolved
    Resolved
    • This incident has been resolved.

    – We are happy to report that NERC production services are now back online and ready for use. NERC is back to its normal mode of operation.

    – If you encounter any problems your current instances, storage, or setup please don't hesitate to contact us via email (help@nerc.mghpcc.org) or, by submitting a new ticket at the NERC's Support Ticketing System (osTicket). We appreciate your patience and understanding during the past couple of days as we worked to bring NERC services back online.

  • Monitoring
    Monitoring

    – Our team continues to work to restore NERC services. We continue to address issues with networking and NERC’s back end authentication, which is preventing services from returning to full production.

    – Until the backend authentication issues are resolved you will not be able to login to the NERC OpenStack and OpenShift web dashboard.

    – We are continuing to work on a fix for this incident. We regret any disruption this may have caused you and your team.

    – Please don't hesitate to contact us via email (help@nerc.mghpcc.org) or, by submitting a new ticket at the NERC's Support Ticketing System (osTicket). We appreciate your understanding as we continue to restore NERC to full production.

    – Our next update on this outage will be at 5:00 PM.

  • Identified
    Identified

    – Currently, our dedicated team is working hard to restore NERC services. We continue to address issues with NERC’s back end authentication and networking, which is preventing services from returning to full production.

    – We are continuing to work on a fix for this incident. We regret any disruption this may have caused you and your team.

    – Please don't hesitate to contact us via email (help@nerc.mghpcc.org) or, by submitting a new ticket at the NERC's Support Ticketing System (osTicket). We appreciate your understanding as we continue to restore NERC to full production.

    – Our next update on this outage will be at 1:00 PM.

  • Monitoring
    Monitoring

    Updatefrom MGHPCC : Sunday October 15 at 11:45AM

    Power has been restored to all racks in the computer room, and the data center is back to its normal mode of operation.We implemented a fix and are currently monitoring the result.

    We are now powering on our controllers and nodes at NERC side.

  • Update
    Update

    Update from MGHPCC : Sunday October 15 at 11:22AM

    The power-on sequence for the computer room is now under way.


    We are continuing to work on a fix for this incident.

  • Identified
    Identified

    Update from MGHPCC : Sunday October 15 at 9:12AM

    We believe we have found the root cause of the problem that prevented the computer room generator from going into service when utility power failed.

    One more test to verify the diagnosis, and then we’ll start restoring computer room power.

    Power will be restored in stages.


    We are continuing to work on a fix for this incident.

  • Investigating
    Investigating

    Due to Major power event at MGHPCC (Holyoke) data center. We have confirmed that power to UPS-backed systems in the Computer Room and Entrance Rooms is down. We are currently investigating this incident.

Sep 2023

Upcoming Red Hat's OpenShift container platform (OCP) Version Upgrade on NERC will occur on Wednesday September 27, 2023 from 8:00 AM – 5:00 PM.
  • Completed
    September 28, 2023 at 1:02 AM
    Completed
    September 28, 2023 at 1:02 AM

    Maintenance has completed successfully.

  • Update
    September 27, 2023 at 8:17 PM
    In progress
    September 27, 2023 at 8:17 PM

    Maintenance requires more incremental upgrade of OpenShift Versions so we are extending this till midnight.

  • In progress
    September 27, 2023 at 12:00 PM
    In progress
    September 27, 2023 at 12:00 PM

    Maintenance is now in progress

  • Planned
    September 27, 2023 at 12:00 PM
    Planned
    September 27, 2023 at 12:00 PM

    Upcoming Red Hat’s OpenShift container platform (OCP) upgrade on NERC from version 4.10 to 4.13 will occur on Wednesday September 27, 2023 from 8:00 AM – 5:00 PM.

    GENERAL MAINTENANCE

    • We are writing to inform you that we are planning to upgrade our OpenShift cluster from version 4.10 to 4.13 on Wednesday, September 27, 2023. As our current version of OpenShift has already ended Red Hat’s maintenance support on September 10, 2023, this upgrade is crucial and will bring several new features and improvements to our cluster. Also, we will enable GPU resources to our OpenShift and Red Hat OpenShift Data Science Platform (RHODS) that will provide a fully supported sandbox environment for data scientists/researchers to develop, train and test AI/ML models and deploy them for use in intelligent applications.

    NOTICES

    • Please keep a backup for your critical data or application running on your project on NERC’s OpenShift cluster.

    • The new upgrade to the latest OpenShift version will provide improved performance, enhanced security, additional operators and enhanced user interface along with updated Kubernetes version.

    • The estimated time to complete this update is 4 hours. It can take more or less time, so we urge you to keep an eye on https://nerc.instatus.com/ to get the progress during this time.

    • During the upgrade, there will be a period of downtime to ensure a seamless transition. We anticipate this downtime to last approximately 1 day, and we apologize for any inconvenience this may cause. It can take more or less time, so we urge you to keep an eye on https://nerc.instatus.com/ and also subscribe to https://nerc.instatus.com/subscribe/email to get the progress updates during this time.

    • If you have not used or tested, our state-of-the-art OpenShift and RHODS platform that enables you and your team to deploy containerized applications in a cloud-native environment, providing a reliable, isolated, and scalable solution for your complex research computing and teaching needs. Please get started using these platforms by requesting a new resource allocation to your project using NERC’s ColdFront web console or you can get in touch with us to have a quick demo.

    More information about NERC is available on NERC’s website (https://nerc.mghpcc.org/). If you have any questions, please don’t hesitate to reach out to us via email (help@nerc.mghpcc.org) or, by submitting a new ticket at the NERC's Support Ticketing System (osTicket).

    Thanks,

    New England Research Cloud (NERC)

    https://nerc.mghpcc.org/

    https://nerc-project.github.io/nerc-docs/

Upcoming NERC OpenStack Maintenance - Additional GPU hosts and RadosGW improvements Monday September 18, 2023 9:00 AM -1:00 PM
  • Completed
    September 19, 2023 at 3:15 PM
    Completed
    September 19, 2023 at 3:15 PM

    Maintenance has completed successfully.

  • Update
    September 18, 2023 at 5:02 PM
    In progress
    September 18, 2023 at 5:02 PM

    Maintenance is still in progress. We are extending this maintenace for another couple hours and will update as this progress …

  • In progress
    September 18, 2023 at 1:00 PM
    In progress
    September 18, 2023 at 1:00 PM

    Maintenance is now in progress

  • Planned
    September 18, 2023 at 1:00 PM
    Planned
    September 18, 2023 at 1:00 PM

    NERC’s planned OpenStack Maintenance - Additional GPU hosts and RadosGW improvements will occur on Monday September 18, 2023 9:00 AM -1:00 PM.

    GENERAL MAINTENANCE

    • The NERC will be adding additional GPU nodes (K80) to the OpenStack deployment. In addition, we will also be performing maintenance on the RadosGW (Swift/S3) object storage service.

    NOTICES

    • During this time the RadosGW object storage service will be unavailable.

    • Please be aware that during this maintenance, project and project change requests will not be accepted and approved.

    • The estimated time to complete this update is 4 hours. It can take more or less time, so we urge you to keep an eye on https://nerc.instatus.com/ to get the progress during this time.

    • Please do subscribe to the NERC’s status for any future updates: https://nerc.instatus.com/subscribe/email

    Our priority is to help make science happen, so If you or your research team have any questions or need to escalate an issue, please don't hesitate to reach out to us via email (help@nerc.mghpcc.org) or, by submitting a new ticket at the NERC's Support Ticketing System (osTicket).

    Thanks,

    New England Research Cloud (NERC)

    https://nerc.mghpcc.org/

    https://nerc-project.github.io/nerc-docs/

Aug 2023

NERC OpenShift Container Platform (OCP) Maintenance [August 21, 2023 9:00 AM - 5:00 PM]
  • Completed
    August 21, 2023 at 4:53 PM
    Completed
    August 21, 2023 at 4:53 PM

    Maintenance has completed successfully.

  • In progress
    August 21, 2023 at 1:00 PM
    In progress
    August 21, 2023 at 1:00 PM

    Maintenance is now in progress

  • Planned
    August 21, 2023 at 1:00 PM
    Planned
    August 21, 2023 at 1:00 PM

    NERC’s planned OpenShift container platform (OCP) maintenance will occur on Monday August 21, 2023 from 9:00 AM – 5:00 PM.

    GENERAL MAINTENANCE

    • We will be moving the OpenShift container platform worker nodes to a new location within the datacenter. The core OpenShift services will be interrupted during this time. Any critical workloads that are deployed in the cluster need to be stopped until the maintenance is complete.

    NOTICES

    • We will be powering off all OpenShift cluster hosts prior to the move.

    • The NERC OpenShift cluster will be unavailable during this time. Please let us know If you encounter any issues after the maintenance has completed.

    • The estimated time to complete this update is 8 hours. It can take more or less time, so we urge you to keep an eye on https://nerc.instatus.com/ to get the progress during this time.

    • Please do subscribe to the NERC’s status for any future updates: https://nerc.instatus.com/subscribe/email

    Our priority is to help make science happen, so If you or your research team have any questions or need to escalate an issue, please contact us via email (help@nerc.mghpcc.org).

    Thanks,

    New England Research Cloud (NERC)

    https://nerc.mghpcc.org/

    https://nerc-project.github.io/nerc-docs/

Aug 2023 to Oct 2023

Next