All Systems Nominal 🚀
All Platform Services are operational.
Current Information on Service Availability
| Service | Status |
|---|---|
| OpenShift Container Platform | Operational |
| Single Sign On (SSO) | Operational |
| RocketChat | Operational |
| NetApp Storage Provisioning | Operational |
| Mobile Signing Service | Operational |
OpenShift 4 Application Migration Roadmap Update - September 2020
We’ve now completed two weeks of extensive planning sessions to provide the following roadmap to help guide your migration planning:
- Pre-Migration, Platform Services: September/October
- Phase 1, Early Adoption: October 14 - Mid-November (depending on the migration progress)
- Phase 2, Early Majority: Late November - Early January (TBC, contingent upon Early Adoption Phase)
- Phase 3, Late Majority: January/February (TBC, contingent upon Early Majority Phase)
- Phase 4, Sweep: Target end date - Feb 21, 2021 (Openshift 3.11 Platform is decommissioned)
Click here for more information.
Incident Reports
-
Kamloops DDoS
On Wednesday March 3rd, 2021 starting at 9:45AM the Kamloops data centre encountered a notable Distributed Denial of Service (DDoS). While the DDoS was short lived it did degrade some services, including those run on the OpenShift Container Platform (OCP). -
Kamloops DDoS
On Saturday February 27th, 2021 starting at 4AM the Kamloops data centre encountered a very determined Distributed Denial of Service (DDoS). This attack caused temporary outages of government email, government websites, as well as caused a temporary outage of the OpenShift 3 and 4 platforms which in tern impacted the... -
SSO Dev Gateway Timeout (504)
SSO Dev - Gateway Timeout (504) Today between approximately 3:30PM and 4:00PM users access SSO Dev. They would get a 504 Gateway Timeout response. What was the issue? Earlier today SSO Dev was patching its network policies to coincide with the Dev Network Policy Migration from Aporeto to K8s Network... -
Silver Cluster Login Failures
Login Failures into Silver Cluster Today between approximately 12:45PM and 2:00PM users were not able to authenticate against the Silver cluster. This was due to a bad config within the SSO realm used for Openshift Authentications. It was quickly found and resolved. There was no impact to other client applications... -
OpenShift Partial Outage
On Thursday January 28th, 2021 there was a partial outage of the OpenShift Container Platform (OCP) from ~13:22 PST to ~16:17 PST. Details of the incident are as follows: TL;DR In October we applied “Workaround 2” from https://access.redhat.com/solutions/5448851 to the clusters to resolve an issue with the k8s API. According... -
SSO Pathfinder Route Deprecation in DEV
As scheduled back in early Fall, today the SSO Pathfinder route in DEV is being deprecated. More Information Impact Notes: if you have not already migrated your application to use dev.oidc.gov.bc.ca your applications will no longer be able to reach the BC Gov sso service by using dev-sso.pathfinder.gov.bc.ca Related Devhub... -
SSO Tuning Hotfix
A hot fix is scheduled for the BCGov Redhat SSO instance in the dev, test, and prod environments later this evening. There are remediatory actions being taken to prevent the service degradation issues that have recently been affecting the service. Changes increasing cpu and memory requests/limits for the patroni statefulset... -
Trident Provisioner Enabled
Trident Service for NetApp storage provisioning on Openshift 3.11 has been turned on again at noon on Tuesday Oct 13, 2020. What is happening? Following the Trident Storage provisioning service upgrade failure on Thursday Oct 8, the storage provisioning has been turned off on the Platform on Friday Oct 9... -
Trident Provisioner Temporarily Disabled
As an ongoing action to yesterday’s Service Outage, the Trident Provisioner Service for Openshift 3.11 is being turned off while we look to resolve the issue. This issue does not impact current pvcs. Impact no new netapp pvcs will be provisionable while the Trident Provisioner is turned off this will... -
OpenShift Partial Outage
Some services and applications on the Platform including RocketChat and TheOrgBook seem to be experiencing service disruptions. The Platform Operations Team is troubleshooting the issue and will post an update as soon as more information is available. -
OpenShift Service Disruption
What Happened At ~11:30am today we became aware of an issue causing system wide service degradation to the OpenShift Container Platform (OCP) as well as hosted applications and shared services such as SSO and RocketChat. Through investigation it was determined the issue was caused by the same router pod (HAProxy)... -
OpenShift Routing Issue
One of the three router pods kept crashing. Each time it crashed the VIP would get moved between the other routers and cause some connection slowness for access to services via Routes. We’ve disabled the defective pod for now and opened a case with Red Hat. -
Nginx Build Issues
Today at approximately 2PM an issue was identified in a commonly used nginx Dockerfile. This nginx Dockerfile was changing the permissions of /etc to be wide open; this conflicted with a directory injected by our container scanning tool Aqua and caused builds to fail. To remedy this the community has... -
OpenShift Partial Outage
An update to Aporeto today had several unintended consequences. This caused a widespread service disruption from approximately 3PM PST to 4:30 PM PST. This disruption may have left some applications in an unstable state. When Aporeto is restarted it flushes firewall rules. This causes any in-flight network connections to fail....
subscribe via RSS