Jump to content

Enterprise Experiment Errors Entire Organizations - A Chrome update enabled experimental features causing widespread Enterprise issues

rcmaehl

Disclaimer: Warning, I made this post on Mobile, formatting might be borke.

 

Sources:

ZDnet

Chrome bug tracker

 

Summary:

Ab experimental flag enabled in Chrome by an update bricked Chrome for various businesses running Thin Clients and other Terminal Servers.

 

Quotes/Excerpts:

Quote

A Google Chrome experiment has gone horribly wrong...and ended up crashing browsers on thousands, if not more, enterprise networks for nearly two days. The issue...appeared on Wednesday. It didn't impact all Chrome users,... only Chrome running on Windows Server "terminal server" setups -- a very common setup in enterprise networks. Users said that Chrome tabs were going blank...in what's called a "White Screen of Death" (WSOD). System administrators at many companies reported that hundreds and thousands of employees couldn't use Chrome...as the active browser tab kept going blank while working.In tightly controlled enterprise environments,...employees didn't have the option to change browsers and were left unable to do their jobs. System administrators couldn't just replace Chrome with another browser right away. "This has had a huge impact for all our Call Center agents and not being able to chat with our members," someone with a Costco email address said in a bug report. "Our organization with multiple large retail brands had 1000 call center agents and many IT people affected for 2 days. This had a very large financial impact," said another user. Hundreds of complaints poured in via Google's support forum, Chrome bug tracker, and Reddit. One impacted sysadmin told ZDNet that they initially mistook the Chrome blank tabs as a sign of malware and reacted accordingly, starting network-wide security audits. The root cause of the bug was eventually found, and traced back to a feature called "WebContents Occlusion." This is an experimental feature that suspends Chrome tabs when users move other app windows on top of Chrome, treating the active Chrome tab as a background tab. The feature,... had been under testing in Chrome Canary and Chrome Beta releases all year. This week, Google decided to test it in the main Stable release, so it could get more feedback on how it behaved. It behaved badly is an understatement. The Chrome team said they pushed a new Chrome configuration file to all Chrome users and disabled the experiment. Chrome engineers operate a system called Finch that lets them push updated Chrome settings to active installs, such as enabling or disabling experimental flags. However, fixing the problem actually made system administrators even angrier. Many didn't know that Chrome engineers could run experiments on their tightly-controlled Chrome installations, let alone that Google engineers could just ship changes to everyone's browsers without any prior approval.

 

My Thoughts:

OOF. Those environments running Citrix and Thin clients sound like they had a fun week. This was an expensive lesson in redundancy for some businesses and they want Google to be at fault. Whether or not Google should be held at fault is up to you to decide but I personally believe they can't be held fully at fault.

PLEASE QUOTE ME IF YOU ARE REPLYING TO ME

Desktop Build: Ryzen 7 2700X @ 4.0GHz, AsRock Fatal1ty X370 Professional Gaming, 48GB Corsair DDR4 @ 3000MHz, RX5700 XT 8GB Sapphire Nitro+, Benq XL2730 1440p 144Hz FS

Retro Build: Intel Pentium III @ 500 MHz, Dell Optiplex G1 Full AT Tower, 768MB SDRAM @ 133MHz, Integrated Graphics, Generic 1024x768 60Hz Monitor


 

Link to comment
Share on other sites

Link to post
Share on other sites

Yay, more permission exploits and no prior approval. What else is new? :D Taking after its Microsoft Windows brethren it seems, breaking things in the process.

Link to comment
Share on other sites

Link to post
Share on other sites

This is why you should always have a backup browser.

 

Also I was thinking of posting this news eariler but I kept bumping into verge.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, rcmaehl said:

Many didn't know that Chrome engineers could run experiments on their tightly-controlled Chrome installations, let alone that Google engineers could just ship changes to everyone's browsers without any prior approval.

LADS, IT'S GOOGLE.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, HarryNyquist said:

LADS, IT'S GOOGLE.

Exactly why I think Google can't be 100% to blame.

PLEASE QUOTE ME IF YOU ARE REPLYING TO ME

Desktop Build: Ryzen 7 2700X @ 4.0GHz, AsRock Fatal1ty X370 Professional Gaming, 48GB Corsair DDR4 @ 3000MHz, RX5700 XT 8GB Sapphire Nitro+, Benq XL2730 1440p 144Hz FS

Retro Build: Intel Pentium III @ 500 MHz, Dell Optiplex G1 Full AT Tower, 768MB SDRAM @ 133MHz, Integrated Graphics, Generic 1024x768 60Hz Monitor


 

Link to comment
Share on other sites

Link to post
Share on other sites

I imagine that there are zero thin clients using Chrome Canary/Beta, so the options here are either that Google runs experiments in the release channel, or they just push it to everyone in the release channel. Running an experiment in the release channel, where they only enable the feature for a subset of users and can easily disable it if things go wrong, is a very good solution IMO, and one that is used by Firefox too.

 

Also, to describe it as "a very common setup in enterprise networks" is a bit unfair - it is obviously used by a few big industries, like call centers, but it isn't necessarily a use case that the majority of Google engineers would be, or would usually need to be, aware of. That resulted in a bug in this instance, but I'm not sure what better software engineering practices they could have followed to prevent this from happening in future. However, I'm certain that the team responsible will be doing a post-mortem to identify the things that they should do to prevent something like this from occurring in future.

HTTP/2 203

Link to comment
Share on other sites

Link to post
Share on other sites

22 hours ago, HarryNyquist said:

LADS, IT'S GOOGLE.

I really dislike the articles use or experiments.  Gmail was just an experiment for the longest time; same with google maps.  The bit you quoted as well, just to clarify a bit on it though, it wasn't like it was something untested; and it is quite common to do a small deployment to test in the final release version (just in this case it was flagged for everyone instead of a smaller subset).

 

Quote

The feature,... had been under testing in Chrome Canary and Chrome Beta releases all year. This week, Google decided to test it in the main Stable release, so it could get more feedback on how it behaved

If the problem wasn't caught in Canary and Beta in a year of testing, then google really can't be blamed too much for this (aside from making it a global release).

 

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, wanderingfool2 said:

I really dislike the articles use or experiments.  Gmail was just an experiment for the longest time; same with google maps.  The bit you quoted as well, just to clarify a bit on it though, it wasn't like it was something untested; and it is quite common to do a small deployment to test in the final release version (just in this case it was flagged for everyone instead of a smaller subset).

 

If the problem wasn't caught in Canary and Beta in a year of testing, then google really can't be blamed too much for this (aside from making it a global release).

 

 

The issue here isn't really that it went bad, (unless it turns out google utterly screwed up the year long test phase), it's that google [pushed an update without telling the customer about it, thus giving them great difficulties in even diagnosing the cause. And without giving them any means to rollback or disable the feature if it went bad. 

 

I mean imagine if a major game pushed a major patch out without informing anyone and it caused a major system crashing bug that became a crash loop on first crash. The response would be downright  volcanic in magnitude.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, CarlBar said:

 

The issue here isn't really that it went bad, (unless it turns out google utterly screwed up the year long test phase), it's that google [pushed an update without telling the customer about it, thus giving them great difficulties in even diagnosing the cause. And without giving them any means to rollback or disable the feature if it went bad. 

 

I mean imagine if a major game pushed a major patch out without informing anyone and it caused a major system crashing bug that became a crash loop on first crash. The response would be downright  volcanic in magnitude.

One thing that I think should be brought up is that it shouldn't have been too hard to diagnose the cause, and a proper management of the software should have prevented a company using the terminal servers from experiencing this issue.  If a company relies on a web-browser to do their work, and forces only a single browser on the users (without an option or backup); then I would expect said company should also be blocking those updates from arriving (you can do so easily by firewall rules blocking the update).  Then manually deploy the new versions

 

Honestly, if chrome began giving a white screen, it wouldn't be too hard to figure out that it likely was an update that was the culprit.  I mean, Chrome broke for some SEP users less than a month ago (the chrome update used a more secure code feature that some versions of SEP didn't like)...it took like 5 minutes to realize there was a chrome update, and another 5 minutes to realize the antivirus was incorrectly flagging it.

 

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, wanderingfool2 said:

One thing that I think should be brought up is that it shouldn't have been too hard to diagnose the cause, and a proper management of the software should have prevented a company using the terminal servers from experiencing this issue.  If a company relies on a web-browser to do their work, and forces only a single browser on the users (without an option or backup); then I would expect said company should also be blocking those updates from arriving (you can do so easily by firewall rules blocking the update).  Then manually deploy the new versions

 

Honestly, if chrome began giving a white screen, it wouldn't be too hard to figure out that it likely was an update that was the culprit.  I mean, Chrome broke for some SEP users less than a month ago (the chrome update used a more secure code feature that some versions of SEP didn't like)...it took like 5 minutes to realize there was a chrome update, and another 5 minutes to realize the antivirus was incorrectly flagging it.

 

 

 

the problem here is as far as anyone at these data centers knew nothing in the software configurations had changed. It isn't just that google didn't inform them they where deploying a patch, the end users were completely unaware a patch had occurred so they couldn't finger the update as a [probable culprit because they weren't aware there had been a change. Everyone was probably scrambling to figure out what had changed to cause this, hence at least one data center believing they'd been hit with a malware attack.

 

I suspect @leadeater can probably talk much more thoroughly about why they might have configured things the way they did. I suspect however that whatever the reasons everyone is probably busy changing things.

Link to comment
Share on other sites

Link to post
Share on other sites

On 11/15/2019 at 10:06 AM, gabrielcarvfer said:

And that's why browsers are basically malware.

which browser do you use?

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, CarlBar said:

the problem here is as far as anyone at these data centers knew nothing in the software configurations had changed. It isn't just that google didn't inform them they where deploying a patch, the end users were completely unaware a patch had occurred so they couldn't finger the update as a [probable culprit because they weren't aware there had been a change. Everyone was probably scrambling to figure out what had changed to cause this, hence at least one data center believing they'd been hit with a malware attack.

 

I suspect @leadeater can probably talk much more thoroughly about why they might have configured things the way they did. I suspect however that whatever the reasons everyone is probably busy changing things.

The slightly more accurate problem description is no Chrome update was applied that caused the issue. Chrome updates are disabled in these configurations and admins were reporting the issue across different versions of Chrome. What happened is Google pushed a configuration change out then retracted the change so if you were trying to do any kind of troubleshooting and root cause analysis you were left with a situation where no software updates had been applied, no Windows updates, server hasn't been rebooted, so ahh??.... bang head in to wall until you pass out.

 

If it were an update to Chrome you'd not only find that extremely quickly you'd also know that an update had been applied because you did it, because that's the only possible way Chrome can update.

Link to comment
Share on other sites

Link to post
Share on other sites

Oh and the solution is to apply default configuration changes using a software update so you can actually have change tracking. Google is just asking for people to switch to Firefox or Edge, probably Edge now that it's Chrome anyway but with the Microsoft GPO blessing.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, leadeater said:

Edge now that it's Chrome anyway but with the Microsoft GPO blessing

How does one do that with Group Policy let's say one has twenty endpoints to manage? Is it remotely or the sysadmin has to manually configure each endpoint?

There is more that meets the eye
I see the soul that is inside

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, captain_to_fire said:

How does one do that with Group Policy let's say one has twenty endpoints to manage? Is it remotely or the sysadmin has to manually configure each endpoint?

It's more that GPO has, and has always had, options for configuring IE and Edge and Microsoft doesn't push changes down to browsers like Google does. So the Microsoft way would have been an update to Windows and an Update to Edge and in the KB article it would have in it the description of the new feature, what the default setting is and that it can be configured using GPO.

 

So even if it were on by default it would only happen after an update and you'd have easily accessible information on the update that caused the issue and information contained in it on how to control the problematic setting.

 

Chrome = YOLO

Edge = Honor the GPO/Admin

 

Edit:

Any computer joined to the domain does what is termed as a Resultant Set of Policy, GPOs are applied to either Computer Objects or User Objects in the domain and you can be basically as granular as you like so you can have the feature enabled for everything that is not an RDS server or for everyone that is not a call center agent, or any combination of anything you like.

 

Objects are organised in a tree like structure called Organisational Units (OU) and Containers. Only OUs can have GPOs assigned to them and GPOs apply down the tree and any conflicting policy defined in more than one GPO the furthest down the tree or closets to the affected object supersedes the higher level GPO.

 

Example image from Google

view-gpo-scope.png

Link to comment
Share on other sites

Link to post
Share on other sites

Chrome's GPOs are perfectly workable. The problem here is that Google turned on an experimental feature out-of-band in existing installs. It wasn't part of an update or anything! If you think Microsoft are any better I will point out to you the entire history of windows 10 updates between July 20 2015 and now on non-enterprise editions of Windows. 

Intel 11700K - Gigabyte 3080 Ti- Gigabyte Z590 Aorus Pro - Sabrent Rocket NVME - Corsair 16GB DDR4

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, jake9000 said:

Chrome's GPOs are perfectly workable.

According to the bug tracker there is no GPO for this setting currently, at least that is what people have been posting in it.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×