Identified - We have an ongoing issue with the eRI’s identity service which is causing a variety of inconsistent user experiences on the platform (and may yet be behind other problems that we haven’t linked yet). This issue is (unfortunately) ongoing since at least Q4'24, we recently realised there is a gap in our Statuspage communications for this, hence the degraded state now recorded against the Identity Service.

We have been working with vendor support from Red Hat over the last few months and are in the process of engaging them to do further analysis (and assist with fixes and/or mitigations).

This issue results in symptoms such as:
- OnDemand service being slow to load and launch sessions.
- Globus file listing timeouts.
- The filesystem feeling slow, commands such as "ls -l" take a long time.
- The above symptom is because full group resolution takes a long time (~1 minute or more) for an initial (non-cached) query and then completes quickly (until local caches expire). This may result in users experiencing slow/inconsistent performance for IO heavy workloads when there is group resolution involved in the file access or metadata operations. In some cases this may be mitigated by using numeric user and group IDs instead (e.g. “ls -n”). There are many different shell commands and interactions that might experience this issue.
- Dataset and Project access inconsistencies on different nodes. In some cases the local caches are populated with incomplete data (due to upstream timeouts) which then results in a machine having an incomplete group resolution for an impacted user. This might be experienced by the user as an inability to access data in a Dataset that they are a member of.

If you are experiencing any of these issues or something like them, please do still report them to support so we can effectively track the impact and look at mitigations for your particular issue.

Jan 29, 2025 - 17:30 NZDT

About This Site

AgResearch eRI status

Identity Broker Service ? Degraded Performance
90 days ago
100.0 % uptime
Today
Managed Storage Service ? Operational
90 days ago
100.0 % uptime
Today
General Flexi HPC Platform ? Operational
90 days ago
100.0 % uptime
Today
Network connectivity ? Operational
90 days ago
100.0 % uptime
Today
Compute cluster ? Operational
90 days ago
100.0 % uptime
Today
Login nodes ? Operational
90 days ago
99.79 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

DMF v7.9 upgrade Apr 9, 2025 11:00-19:00 NZST

A DMF upgrade is scheduled for Wed April 9th from 1100hrs. The impact should be minimal. SMB shares will be available and Slurm processing will continue. However DMF will not be available to recall any project files that are offline, i.e. stored on tape. This is currently quite rare, but should you access such a file, the recall will hang, and your application may timeout or throw an error. In this event you should wait until the maintenance is completed and then rerun your job.
Posted on Mar 25, 2025 - 14:10 NZDT
Apr 2, 2025

No incidents reported today.

Apr 1, 2025

No incidents reported.

Mar 31, 2025

No incidents reported.

Mar 30, 2025

No incidents reported.

Mar 29, 2025

No incidents reported.

Mar 28, 2025

No incidents reported.

Mar 27, 2025

No incidents reported.

Mar 26, 2025

No incidents reported.

Mar 25, 2025

No incidents reported.

Mar 24, 2025

No incidents reported.

Mar 23, 2025

No incidents reported.

Mar 22, 2025

No incidents reported.

Mar 21, 2025

No incidents reported.

Mar 20, 2025

No incidents reported.

Mar 19, 2025

No incidents reported.