All Systems Operational

About This Site

AgResearch eRI status

Identity Broker Service ? Operational
90 days ago
99.77 % uptime
Today
Managed Storage Service ? Operational
90 days ago
100.0 % uptime
Today
General Flexi HPC Platform ? Operational
90 days ago
98.56 % uptime
Today
Network connectivity ? Operational
90 days ago
100.0 % uptime
Today
Compute cluster ? Operational
90 days ago
98.92 % uptime
Today
Login nodes ? Operational
90 days ago
99.97 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

OpenStack control plane patching Nov 5, 2025 09:30-14:30 NZDT

We are updating OpenStack control components at this time. The public APIs (this impacts Dashboard and CLI clients too) may be unavailable for short periods during this window. There is no impact to existing running cloud infrastructure or other eRI services.
Posted on Nov 05, 2025 - 09:14 NZDT

Cumulus network switch upgrades Nov 18, 2025 18:00 - Nov 19, 2025 06:00 NZDT

The border switches will have an upgrade applied overnight on Nov 18th from 6pm. Any ssh and external connections to the cluster and storage may get broken during this maintenance. Slurm jobs will be unaffected. Access to Windows data drives will likely be interrupted throughout the maintenance window.
Posted on Nov 07, 2025 - 09:25 NZDT
Nov 7, 2025

No incidents reported today.

Nov 6, 2025

No incidents reported.

Nov 5, 2025

No incidents reported.

Nov 4, 2025

No incidents reported.

Nov 3, 2025

No incidents reported.

Nov 2, 2025

No incidents reported.

Nov 1, 2025

No incidents reported.

Oct 31, 2025

No incidents reported.

Oct 30, 2025

No incidents reported.

Oct 29, 2025
Resolved - All bare metal compute nodes have now had the network config change applied. Packet loss is no longer occurring. In addition we have since identified a workload that was causing slow i/o response from the filesystem. This has been removed whilst we work to improve it.
Oct 29, 15:32 NZDT
Update - The network config change has now been applied to compute-2 and -4 successfully. Nix is now running better on these two nodes. We will continue to apply the same change to all the compute nodes as they become available.
Oct 23, 10:21 NZDT
Update - We are continuing to monitor for any further issues.
Oct 23, 10:19 NZDT
Monitoring - The cluster and login nodes appear to be stable and performant now, although Nix may be slow on some compute nodes (2 and 4). We will be implementing a network config change on each compute node, in a rolling fashion to minimise the impact. This requires draining each node in the Slurm cluster, one at a time.
Oct 22, 12:30 NZDT
Investigating - We are continuing to see slow response issues with Slurm and Nix but it seems to be intermittent. Investigation continues.
Oct 21, 10:07 NZDT
Monitoring - We made a network configuration change to a single node last night. The cluster has been stable overnight, with some load on it. We'll continue monitoring today as the load increases. We will make the same change to the other bare metal compute nodes, in a rolling fashion, as they become available.
Oct 21, 08:45 NZDT
Update - We have found evidence of network packet loss again and are continuing to investigate
Oct 20, 16:23 NZDT
Investigating - We are currently investigating this issue.
Oct 20, 15:22 NZDT
Resolved - Probably caused by a DMF dependency. DMF was being upgraded at the time.
Oct 29, 15:08 NZDT
Monitoring - The persistent filesystem has been remounted on the protocol nodes, Windows drives and datasets are available. The Nix head node has been fixed, and all Nix clients have been restarted and tested. It seems Slurm jobs were unaffected as the filesystem remained available on the compute nodes.
Oct 23, 15:01 NZDT
Investigating - It has been reinstated and is now available on the login nodes. Compute and protocol nodes being checked. Nix will need restarting
Oct 23, 14:38 NZDT
Oct 28, 2025

No incidents reported.

Oct 27, 2025

No incidents reported.

Oct 26, 2025

No incidents reported.

Oct 25, 2025

No incidents reported.

Oct 24, 2025
Completed - The scheduled maintenance has been completed.
Oct 24, 08:44 NZDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 23, 11:00 NZDT
Scheduled - HPE will be upgrading the DMF software on Thursday, Oct 23rd from about 1100hrs. Recalls of files from tape will not be possible during that time. No other impact is expected. The Slurm cluster will not be impacted.
Oct 21, 16:00 NZDT