Enterprise Experiment Errors Entire Organizations - A Chrome update enabled experimental features causing widespread Enterprise issues

rcmaehl · November 15, 2019

Disclaimer: Warning, I made this post on Mobile, formatting might be borke.

Sources:

Summary:

Ab experimental flag enabled in Chrome by an update bricked Chrome for various businesses running Thin Clients and other Terminal Servers.

Quotes/Excerpts:

Quote

A Google Chrome experiment has gone horribly wrong...and ended up crashing browsers on thousands, if not more, enterprise networks for nearly two days. The issue...appeared on Wednesday. It didn't impact all Chrome users,... only Chrome running on Windows Server "terminal server" setups -- a very common setup in enterprise networks. Users said that Chrome tabs were going blank...in what's called a "White Screen of Death" (WSOD). System administrators at many companies reported that hundreds and thousands of employees couldn't use Chrome...as the active browser tab kept going blank while working.In tightly controlled enterprise environments,...employees didn't have the option to change browsers and were left unable to do their jobs. System administrators couldn't just replace Chrome with another browser right away. "This has had a huge impact for all our Call Center agents and not being able to chat with our members," someone with a Costco email address said in a bug report. "Our organization with multiple large retail brands had 1000 call center agents and many IT people affected for 2 days. This had a very large financial impact," said another user. Hundreds of complaints poured in via Google's support forum, Chrome bug tracker, and Reddit. One impacted sysadmin told ZDNet that they initially mistook the Chrome blank tabs as a sign of malware and reacted accordingly, starting network-wide security audits. The root cause of the bug was eventually found, and traced back to a feature called "WebContents Occlusion." This is an experimental feature that suspends Chrome tabs when users move other app windows on top of Chrome, treating the active Chrome tab as a background tab. The feature,... had been under testing in Chrome Canary and Chrome Beta releases all year. This week, Google decided to test it in the main Stable release, so it could get more feedback on how it behaved. It behaved badly is an understatement. The Chrome team said they pushed a new Chrome configuration file to all Chrome users and disabled the experiment. Chrome engineers operate a system called Finch that lets them push updated Chrome settings to active installs, such as enabling or disabling experimental flags. However, fixing the problem actually made system administrators even angrier. Many didn't know that Chrome engineers could run experiments on their tightly-controlled Chrome installations, let alone that Google engineers could just ship changes to everyone's browsers without any prior approval.

My Thoughts:

OOF. Those environments running Citrix and Thin clients sound like they had a fun week. This was an expensive lesson in redundancy for some businesses and they want Google to be at fault. Whether or not Google should be held at fault is up to you to decide but I personally believe they can't be held fully at fault.

Windows7ge · November 15, 2019

Yay, more permission exploits and no prior approval. What else is new? Taking after its Microsoft Windows brethren it seems, breaking things in the process.

williamcll · November 15, 2019

This is why you should always have a backup browser.

Also I was thinking of posting this news eariler but I kept bumping into verge.

HarryNyquist · November 15, 2019

1 hour ago, rcmaehl said:

Many didn't know that Chrome engineers could run experiments on their tightly-controlled Chrome installations, let alone that Google engineers could just ship changes to everyone's browsers without any prior approval.

LADS, IT'S GOOGLE.

rcmaehl · November 15, 2019

Just now, HarryNyquist said:

LADS, IT'S GOOGLE.

Exactly why I think Google can't be 100% to blame.

TempestCatto · November 16, 2019

Who do they think they are, Microsoft?

colonel_mortis · November 16, 2019

I imagine that there are zero thin clients using Chrome Canary/Beta, so the options here are either that Google runs experiments in the release channel, or they just push it to everyone in the release channel. Running an experiment in the release channel, where they only enable the feature for a subset of users and can easily disable it if things go wrong, is a very good solution IMO, and one that is used by Firefox too.

Also, to describe it as "a very common setup in enterprise networks" is a bit unfair - it is obviously used by a few big industries, like call centers, but it isn't necessarily a use case that the majority of Google engineers would be, or would usually need to be, aware of. That resulted in a bug in this instance, but I'm not sure what better software engineering practices they could have followed to prevent this from happening in future. However, I'm certain that the team responsible will be doing a post-mortem to identify the things that they should do to prevent something like this from occurring in future.

wanderingfool2 · November 16, 2019

22 hours ago, HarryNyquist said:

LADS, IT'S GOOGLE.

I really dislike the articles use or experiments. Gmail was just an experiment for the longest time; same with google maps. The bit you quoted as well, just to clarify a bit on it though, it wasn't like it was something untested; and it is quite common to do a small deployment to test in the final release version (just in this case it was flagged for everyone instead of a smaller subset).

Quote

The feature,... had been under testing in Chrome Canary and Chrome Beta releases all year. This week, Google decided to test it in the main Stable release, so it could get more feedback on how it behaved

If the problem wasn't caught in Canary and Beta in a year of testing, then google really can't be blamed too much for this (aside from making it a global release).

CarlBar · November 16, 2019

42 minutes ago, wanderingfool2 said:

I really dislike the articles use or experiments. Gmail was just an experiment for the longest time; same with google maps. The bit you quoted as well, just to clarify a bit on it though, it wasn't like it was something untested; and it is quite common to do a small deployment to test in the final release version (just in this case it was flagged for everyone instead of a smaller subset).

If the problem wasn't caught in Canary and Beta in a year of testing, then google really can't be blamed too much for this (aside from making it a global release).

The issue here isn't really that it went bad, (unless it turns out google utterly screwed up the year long test phase), it's that google [pushed an update without telling the customer about it, thus giving them great difficulties in even diagnosing the cause. And without giving them any means to rollback or disable the feature if it went bad.

I mean imagine if a major game pushed a major patch out without informing anyone and it caused a major system crashing bug that became a crash loop on first crash. The response would be downright volcanic in magnitude.

wanderingfool2 · November 17, 2019

9 hours ago, CarlBar said:

The issue here isn't really that it went bad, (unless it turns out google utterly screwed up the year long test phase), it's that google [pushed an update without telling the customer about it, thus giving them great difficulties in even diagnosing the cause. And without giving them any means to rollback or disable the feature if it went bad.

I mean imagine if a major game pushed a major patch out without informing anyone and it caused a major system crashing bug that became a crash loop on first crash. The response would be downright volcanic in magnitude.

One thing that I think should be brought up is that it shouldn't have been too hard to diagnose the cause, and a proper management of the software should have prevented a company using the terminal servers from experiencing this issue. If a company relies on a web-browser to do their work, and forces only a single browser on the users (without an option or backup); then I would expect said company should also be blocking those updates from arriving (you can do so easily by firewall rules blocking the update). Then manually deploy the new versions

Honestly, if chrome began giving a white screen, it wouldn't be too hard to figure out that it likely was an update that was the culprit. I mean, Chrome broke for some SEP users less than a month ago (the chrome update used a more secure code feature that some versions of SEP didn't like)...it took like 5 minutes to realize there was a chrome update, and another 5 minutes to realize the antivirus was incorrectly flagging it.

CarlBar · November 17, 2019

30 minutes ago, wanderingfool2 said:

One thing that I think should be brought up is that it shouldn't have been too hard to diagnose the cause, and a proper management of the software should have prevented a company using the terminal servers from experiencing this issue. If a company relies on a web-browser to do their work, and forces only a single browser on the users (without an option or backup); then I would expect said company should also be blocking those updates from arriving (you can do so easily by firewall rules blocking the update). Then manually deploy the new versions

Honestly, if chrome began giving a white screen, it wouldn't be too hard to figure out that it likely was an update that was the culprit. I mean, Chrome broke for some SEP users less than a month ago (the chrome update used a more secure code feature that some versions of SEP didn't like)...it took like 5 minutes to realize there was a chrome update, and another 5 minutes to realize the antivirus was incorrectly flagging it.

the problem here is as far as anyone at these data centers knew nothing in the software configurations had changed. It isn't just that google didn't inform them they where deploying a patch, the end users were completely unaware a patch had occurred so they couldn't finger the update as a [probable culprit because they weren't aware there had been a change. Everyone was probably scrambling to figure out what had changed to cause this, hence at least one data center believing they'd been hit with a malware attack.

I suspect @leadeater can probably talk much more thoroughly about why they might have configured things the way they did. I suspect however that whatever the reasons everyone is probably busy changing things.

amdorintel · November 17, 2019

On 11/15/2019 at 10:06 AM, gabrielcarvfer said:

And that's why browsers are basically malware.

which browser do you use?

leadeater · November 17, 2019

12 minutes ago, CarlBar said:

the problem here is as far as anyone at these data centers knew nothing in the software configurations had changed. It isn't just that google didn't inform them they where deploying a patch, the end users were completely unaware a patch had occurred so they couldn't finger the update as a [probable culprit because they weren't aware there had been a change. Everyone was probably scrambling to figure out what had changed to cause this, hence at least one data center believing they'd been hit with a malware attack.

I suspect @leadeater can probably talk much more thoroughly about why they might have configured things the way they did. I suspect however that whatever the reasons everyone is probably busy changing things.

The slightly more accurate problem description is no Chrome update was applied that caused the issue. Chrome updates are disabled in these configurations and admins were reporting the issue across different versions of Chrome. What happened is Google pushed a configuration change out then retracted the change so if you were trying to do any kind of troubleshooting and root cause analysis you were left with a situation where no software updates had been applied, no Windows updates, server hasn't been rebooted, so ahh??.... bang head in to wall until you pass out.

If it were an update to Chrome you'd not only find that extremely quickly you'd also know that an update had been applied because you did it, because that's the only possible way Chrome can update.

leadeater · November 17, 2019

Oh and the solution is to apply default configuration changes using a software update so you can actually have change tracking. Google is just asking for people to switch to Firefox or Edge, probably Edge now that it's Chrome anyway but with the Microsoft GPO blessing.

captain_to_fire · November 17, 2019

8 minutes ago, leadeater said:

Edge now that it's Chrome anyway but with the Microsoft GPO blessing

How does one do that with Group Policy let's say one has twenty endpoints to manage? Is it remotely or the sysadmin has to manually configure each endpoint?

leadeater · November 17, 2019

14 minutes ago, captain_to_fire said:

How does one do that with Group Policy let's say one has twenty endpoints to manage? Is it remotely or the sysadmin has to manually configure each endpoint?

It's more that GPO has, and has always had, options for configuring IE and Edge and Microsoft doesn't push changes down to browsers like Google does. So the Microsoft way would have been an update to Windows and an Update to Edge and in the KB article it would have in it the description of the new feature, what the default setting is and that it can be configured using GPO.

So even if it were on by default it would only happen after an update and you'd have easily accessible information on the update that caused the issue and information contained in it on how to control the problematic setting.

Chrome = YOLO

Edge = Honor the GPO/Admin

Edit:

Any computer joined to the domain does what is termed as a Resultant Set of Policy, GPOs are applied to either Computer Objects or User Objects in the domain and you can be basically as granular as you like so you can have the feature enabled for everything that is not an RDS server or for everyone that is not a call center agent, or any combination of anything you like.

Objects are organised in a tree like structure called Organisational Units (OU) and Containers. Only OUs can have GPOs assigned to them and GPOs apply down the tree and any conflicting policy defined in more than one GPO the furthest down the tree or closets to the affected object supersedes the higher level GPO.

Example image from Google

jake9000 · November 17, 2019

Chrome's GPOs are perfectly workable. The problem here is that Google turned on an experimental feature out-of-band in existing installs. It wasn't part of an update or anything! If you think Microsoft are any better I will point out to you the entire history of windows 10 updates between July 20 2015 and now on non-enterprise editions of Windows.

leadeater · November 17, 2019

4 minutes ago, jake9000 said:

Chrome's GPOs are perfectly workable.

According to the bug tracker there is no GPO for this setting currently, at least that is what people have been posting in it.

Sign In

Enterprise Experiment Errors Entire Organizations - A Chrome update enabled experimental features causing widespread Enterprise issues

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips: