Monitoring - Over the last few weeks we have implemented some changes that have improved IdM performance to a degree.
- Dataset and Project access inconsistencies on different nodes are much less prevalent.
- the OnDemand service being slow to load and launch sessions is still not optimal but has improved.
- general stability of the system is significantly better.
We have further client IdM changes in the pipeline, to be implemented over the next week or so. Initially these will be for the login nodes, and for the protocol nodes which service Windows mounts. Specific status pages will be posted for these changes.
For the moment we will leave the Identity service in a degraded state.

Apr 10, 2025 - 12:46 NZST
Identified - We have an ongoing issue with the eRI’s identity service which is causing a variety of inconsistent user experiences on the platform (and may yet be behind other problems that we haven’t linked yet). This issue is (unfortunately) ongoing since at least Q4'24, we recently realised there is a gap in our Statuspage communications for this, hence the degraded state now recorded against the Identity Service.

We have been working with vendor support from Red Hat over the last few months and are in the process of engaging them to do further analysis (and assist with fixes and/or mitigations).

This issue results in symptoms such as:
- OnDemand service being slow to load and launch sessions.
- Globus file listing timeouts.
- The filesystem feeling slow, commands such as "ls -l" take a long time.
- The above symptom is because full group resolution takes a long time (~1 minute or more) for an initial (non-cached) query and then completes quickly (until local caches expire). This may result in users experiencing slow/inconsistent performance for IO heavy workloads when there is group resolution involved in the file access or metadata operations. In some cases this may be mitigated by using numeric user and group IDs instead (e.g. “ls -n”). There are many different shell commands and interactions that might experience this issue.
- Dataset and Project access inconsistencies on different nodes. In some cases the local caches are populated with incomplete data (due to upstream timeouts) which then results in a machine having an incomplete group resolution for an impacted user. This might be experienced by the user as an inability to access data in a Dataset that they are a member of.

If you are experiencing any of these issues or something like them, please do still report them to support so we can effectively track the impact and look at mitigations for your particular issue.

Jan 29, 2025 - 17:30 NZDT

About This Site

AgResearch eRI status

Identity Broker Service ? Operational
90 days ago
99.96 % uptime
Today
Managed Storage Service ? Operational
90 days ago
100.0 % uptime
Today
General Flexi HPC Platform ? Operational
90 days ago
100.0 % uptime
Today
Network connectivity ? Operational
90 days ago
100.0 % uptime
Today
Compute cluster ? Operational
90 days ago
100.0 % uptime
Today
Login nodes ? Operational
90 days ago
99.79 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Apr 26, 2025

No incidents reported today.

Apr 25, 2025

No incidents reported.

Apr 24, 2025

No incidents reported.

Apr 23, 2025

No incidents reported.

Apr 22, 2025

No incidents reported.

Apr 21, 2025

No incidents reported.

Apr 20, 2025

No incidents reported.

Apr 19, 2025

No incidents reported.

Apr 18, 2025

No incidents reported.

Apr 17, 2025
Resolved - This incident has been resolved.
Apr 17, 16:32 NZST
Monitoring - We have only found a few related issues and these have been rectified. We have also re-sync'd all user groups via Coldfront as a preventative measure. If you do find an issue please advise support@nesi.org.nz and we will address asap.
Apr 17, 11:37 NZST
Investigating - We are investigating an issue where project/dataset owners have lost access to their folders. Apologies for the inconvenience, we are working to resolve this asap.
Apr 17, 09:10 NZST
Completed - The scheduled maintenance has been completed.
Apr 17, 16:32 NZST
Verifying - The globus0 and globus1 sssd cache changes have now been implemented
Apr 17, 14:02 NZST
Scheduled - In the next steps to improving group membership lookups, and general IdM performance, we will be applying cache changes to globus0 and globus1. No impact is expected
Apr 17, 08:28 NZST
Apr 16, 2025

No incidents reported.

Apr 15, 2025
Completed - This maintenance was completed yesterday afternoon without any negative impact
Apr 15, 10:16 NZST
Scheduled - In the next steps to improving group membership lookups, and general IdM performance, we will be applying cache changes to a02hgp02. This is one of two nodes that provide Windows mount access to the PFSS filesystems, persist and scratch. No impact to these mounts is expected.
Apr 10, 15:01 NZST
Completed - This maintenance was completed yesterday afternoon without any negative impact
Apr 15, 10:15 NZST
Scheduled - In the next steps to improving group membership lookups, and general IdM performance, we will be applying cache changes to a02hgp01. This is one of two nodes that provide Windows mount access to the PFSS filesystems, persist and scratch. No impact to these mounts is expected.
Apr 10, 14:58 NZST
Apr 14, 2025
Completed - The maintenance and subsequent testing is now complete. login-0 is available again
Apr 14, 14:24 NZST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 14, 14:00 NZST
Scheduled - In the next steps to improving group membership lookups, and general IdM performance, we will be applying cache and domain changes to login-0 on Monday, April 14th, at 1400hrs. As part of this change login-0 will be rebooted and all sessions will be killed. All users should move to login-1 prior to this maintenance.
Apr 10, 14:50 NZST
Apr 13, 2025

No incidents reported.

Apr 12, 2025

No incidents reported.