Jump to content

Ryzen segmentation faults when compiling heavy GCC Linux loads

34 minutes ago, Drak3 said:

Intel validates that the multipliers can be changed

No they don't validate the changing of the multipliers at all, they even explicitly state they do not. Validation has a meaning and that is rigorous testing under multiple different scenarios to ensure something works and then those configurations are listed, this is not done at all.

 

3 hours ago, Drak3 said:

Your link even calls out something required for ECC to be explicitly supported: validation.

It then goes on to say that Ryzen is compatible, but they don't validate it to work with ECC memory.

 

It's like saying you can't overclock a Xeon or locked sku Core i. You can, but Intel will not help you should anything go wrong, under any circumstance, unless you lie about the circumstance. AMD might be more forgiving on that front, but consumer Ryzen does not explicitly support ECC.

The implication here is that you are giving the impression that there is a higher likelihood of something going wrong or ECC not working when in all likelihood, which AMD have said themselves, ECC will work. Your also giving the impression that no motherboards will support ECC ram when that is not true based on ones actually existing that do. If board partners want to validate it and thereby guarantee it they can.

 

You're also forgetting that when it comes to RAM validation that falls directly on board partners to do. Board partners must do their own validation and publish HCL and this is what the customer looks at.

 

Why this is a problem at all I don't know, ECC in general desktops is extremely rare and it's as simple as saying check your motherboard HCL list and if not present buyer be ware, because even with Xeon CPUs there is ram that does not work on certain motherboards.

Link to comment
Share on other sites

Link to post
Share on other sites

My apologies if this has already been mentioned, but this is indeed an OpCache bug and AMD is currently looking at the issue. The current workaround is to ether disable SMT, or OpCache (if your BIOS allows it). A detailed summary of the problem can be found here:


http://fujii.github.io/2017/06/23/how-to-reproduce-the-segmentation-faluts-on-ryzen/

There is also a 35+ page thread on the AMD support website forums following this issue. Hopefully this is something that can be resolved with a micro-code update since it effects not only Linux, but Windows as well. The problem was first identified on Linux just because more people are doing highly parallelized workloads. 

Link to comment
Share on other sites

Link to post
Share on other sites

good job AMD for fixing your shit /s

 

Epyc CPUs confirmed to suffer from seg faults: 

 

1eGj6tW.png

 

 

as far as I know ThreadRipper uses Epyc cores, so .. TR seg faults too

 

---

 

from what I'm seeing, seems like AMD won't be fixing this via a microcode update and will need a HW stepping

recall!??!

Edited by zMeul
Link to comment
Share on other sites

Link to post
Share on other sites

Seems like a better analogy would be "trying to read from a book, and seeing something that says "turn to page #number#" and the number is one that isn't in the book.

Ketchup is better than mustard.

GUI is better than Command Line Interface.

Dubs are better than subs

Link to comment
Share on other sites

Link to post
Share on other sites

Hopefully someone with more public visibility can post a Video, Blog, or something to get AMD's attention that people know about this issue and that they can't hide it under a rug. Maybe then, they'll be forced to let the public know what their plans are to fix this, since it effects so many people.

 

They say they're looking into it, but in the meantime no one knows what is going on. Ether it can be fixed with micro-code, or you need to RMA. And there is no point in everyone requesting an RMA if it won't do anything, and there's no point in people waiting for a micro-code fix that won't come. Instead they're just keeping everyone in the dark so it won't hurt Ryzen's market buzz, while we're left wondering what is going to happen with a fix.

Link to comment
Share on other sites

Link to post
Share on other sites

On 8/5/2017 at 9:26 AM, zMeul said:

-snip-

 

Just stop. Stop right now, stop spreading fake news. Just stop. The segfaults are from a faulty php build test. My god. Stop. 

Do you even fanboy bro?

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, Liltrekkie said:

 

Just stop. Stop right now, stop spreading fake news. Just stop. The segfaults are from a faulty php build test. My god. Stop. 

fake news and trolling gets locked by the mods, no need to have a coronary. 

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, mr moose said:

fake news and trolling gets locked by the mods, no need to have a coronary. 

To be fair at the time the thread was created and the information available this could have been a Zen architecture specific bug so reporting it is very valid and useful information.

 

As far as the actual segfault issue goes from what I can tell reading the AMD support thread about it this is still an issue, what has been shown to be incorrect is one of the previously believed reliable testing methods to cause the segfaults. If you currently see any tests called Kill-Ryzen or conftest or w/e naming in it it's likely bogus so ignore it.

https://www.reddit.com/r/Amd/comments/6rrbsp/epyc_confirmed_to_suffer_from_the_segfault_issue/dl7hv5f/

https://www.reddit.com/r/Amd/comments/6runcc/reported_epyc_segfault_might_not_be_true/

 

People are still complaining about errors when compiling with GCC.

https://community.amd.com/thread/215773?start=675&tstart=0

 

The current known workaround is to disable Address space layout randomization (ASLR).

https://en.wikipedia.org/wiki/Address_space_layout_randomization

 

It is also important to know this only effects Linux users doing repeated and heavy compiling tasks, with some reports of systems going unstable when idle for long periods of time but I cannot confirm that. Windows systems are not effected however Windows Subsystem for Linux is, that is because it actually is Ubuntu under the hood so it really is Linux and that is why the bug can happen but only under those conditions.

 

The moderation team are not fact checkers or information scrutineers either, we are tasked with making sure the community standards are being followed and that the forum is a place where members can have healthy and productive discussion whether your points of view align or not. If information is not correct or incomplete that is for the community to discuss and contribute to the thread.

 

I do comment a lot in news threads but that is because I am generally interested in computer technology, why else would I be on the forum, but I'm commenting as a member not a moderator.

 

Everyone also needs to be careful not to shoot the messenger, members who create news topics are only reporting what they have found and their opinion on it. However it is the responsibility of the topic creator to accurately report the news and put enough information in there to inform the readers as to what is being reported on and why it is relevant, this is part of the posting guidelines for the news section. News topic creators also have to be prepared to enter in to discussion about the news item as that is the point of the forum, to discuss, and if you are asked a question you should try and answer it if not answered by another member but if you do not know the answer you can just say so. There is nothing wrong with not knowing and you can say "I have no idea I'm just reporting the news", it's not like professional journalists know everything about what they report.

 

This is my own view on this and is not a response from the moderation team. 

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, leadeater said:

To be fair at the time the thread was created and the information available this could have been a Zen architecture specific bug so reporting it is very valid and useful information.

 

 

So if it goes from being legit news to trolling it will get locked yes? 

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, leadeater said:

It is also important to know this only effects Linux users doing repeated and heavy compiling tasks, with some reports of systems going unstable when idle for long periods of time but I cannot confirm that. Windows systems are not effected however Windows Subsystem for Linux is, that is because it actually is Ubuntu under the hood so it really is Linux and that is why the bug can happen but only under those conditions.

Are you sure about that? On page one I linked to a post from a DragonflyBSD developers that had implemented a workaround for (what I assume is) the same issue.

So if that's the case then it's not just a GNU/Linux issue. Any lack of reports of it happening on Windows could be explained by "people who do these types of things generally don't use Windows".

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, LAwLz said:

Are you sure about that? On page one I linked to a post from a DragonflyBSD developers that had implemented a workaround for (what I assume is) the same issue.

So if that's the case then it's not just a GNU/Linux issue. Any lack of reports of it happening on Windows could be explained by "people who do these types of things generally don't use Windows".

Well I was including BSD with that, should really have said so but I figured there was enough info as it is. In one of the forums I forget which multiple people went looking for Windows users reporting problems and none could be found.

 

Hardware bugs, or micro-code, may only show up under specific OS's or software so the fact no Windows users are reporting issues doesn't mean it's not a hardware problem. There are plenty of people that do compiling of code under Windows, just not with GCC. Remember all reports so far are for GCC.

 

The other factor is across all the Linux systems disabling ASLR fixes the issue, ASLR is not used in Windows by default and DLLs have to be specifically tagged as ASLR enabled.

 

The bit you quoted on page one was later said to not be the case by one of the FreeBSD developers, that's actually what zMeul's last thread source was saying. The following is the bug description and workaround that was accepted and rolled in to FreeBSD.

 

Quote

Ryzen (AMD Family 17h) shows stability issues if code is executed near the top of user space. In our case that is the signal trampoline that resides in the amd64 shared page.

 

Move the shared page down by one page on Ryzen as a workaround.

 

The Linux changes are untested.

https://reviews.freebsd.org/D11780

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, leadeater said:

Well I was including BSD with that, should really have said so but I figured there was enough info as it is. In one of the forums I forget which multiple people went looking for Windows users reporting problems and none could be found.

 

Hardware bugs, or micro-code, may only show up under specific OS's or software so the fact no Windows users are reporting issues doesn't mean it's not a hardware problem. There are plenty of people that do compiling of code under Windows, just not with GCC. Remember all reports so far are for GCC.

 

The other factor is across all the Linux systems disabling ASLR fixes the issue, ASLR is not used in Windows by default and DLLs have to be specifically tagged as ASLR enabled.

 

The bit you quoted on page one was later said to not be the case by one of the FreeBSD developers, that's actually what zMeul's last thread source was saying. The following is the bug description and workaround that was accepted and rolled in to FreeBSD.

 

https://reviews.freebsd.org/D11780

Considering some users have had the issues go away with RMA returns and new silicon, that part of the issues are actually just early versions of the Zen package having some over-volt or otherwise power issue when stressed in a very interesting way.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

Well I was including BSD with that, should really have said so but I figured there was enough info as it is. In one of the forums I forget which multiple people went looking for Windows users reporting problems and none could be found.

When I first built my system, I did have issues with the random hard lock while I was away at work.  I'd leave my computer running (not in sleep, though the monitor went to sleep), only to come back and find it completely unresponsive.  Having said that, the latest BIOS has apparently resolved that issue.  It certainly could be a bug in Ryzen that was fixed with the microcode update (AGESA 1.0.0.6).

 

I have no issues with zMeul - or anyone else, for that matter - posting about issues with Ryzen.  It's to be expected that a new platform will have some issues, especially with such a radical change in architecture from their previous generations.  My issues with him was how he went about it.  I was always curious about why he hated AMD so much, but he never went into details.

Link to comment
Share on other sites

Link to post
Share on other sites

Each time a new plattform or cpu-architechture is released, it has it initial bugs.

Both AMD and Intel suffer from bugs at launch.

It always takes some time, for those issues to get resolved.

 

11 minutes ago, Jito463 said:

I have no issues with zMeul - or anyone else, for that matter - posting about issues with Ryzen.  It's to be expected that a new platform will have some issues, especially with such a radical change in architecture from their previous generations.  My issues with him was how he went about it.  I was always curious about why he hated AMD so much, but he never went into details.

zMeuls problem is, that he only sees one side of the medal. I have never actually seen him posting in a Intel-problem thread.

In my opinion he is not able to post constructive, unbiased content.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Ravager911 said:

Each time a new plattform or cpu-architechture is released, it has it initial bugs.

Both AMD and Intel suffer from bugs at launch.

It always takes some time, for those issues to get resolved.

 

zMeuls problem is, that he only sees one side of the medal. I have never actually seen him posting in a Intel-problem thread.

In my opinion he is not able to post constructive, unbiased content.

 

 

The problem is twofold, zMeul admits to hating AMD and sometimes gets carried away.  But when the news is accurate and the problem is real the attacks and hatred from the other side are just as much of an issue because instead of concentrating on the facts,  zmeul becomes the target.    Basically these are just internet forum issues. 

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

AMD has confirmed that they have been able to replicate the fault in house and they are working with people believed to be affected.

 

https://community.amd.com/message/2816382#comment-2816382#2816382 (Post 696 by amdmatt).

 

It has also been confirmed that the faults occurring with Epyc and Threadripper are NOT related to the CPU, instead it is a known software problem. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, Ravager911 said:

zMeuls problem is, that he only sees one side of the medal. I have never actually seen him posting in a Intel-problem thread.

In my opinion he is not able to post constructive, unbiased content.

He has done so several times. He has posted about several Intel and Nvidia problems in the past, but people seem to forget about those posts since this forum is a really big echo chamber about how bad zMeul is. He has done several constrictive posts. This thread is an example of one such instance.

He says which program is affected, and on which platform. He mentions which programs are not affected. He explains what the issue is, and links multiple sources. He then goes on to say that AMD has not identified the issue yet. No bullshit, just facts. Then people start attacking him as soon as they realize the issue is real and there isn't really any way to spin it (but people sure did try).

I'm not sure if posts has been deleted in this thread but seriously, go and read the first page and you will see exactly what I mean.

 

(That's not to say he isn't biased, because even he himself will say he hates AMD).

 

 

 

Just noticed that he is banned. RIP zMeul. I will miss you and I think the mods are doing a very big mistake. Basically, this shows that you can bully a member until they get banned, and I am sure the massive AMD brigade on this forum will utilize this as much as possible.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, LAwLz said:

Just noticed that he is banned. RIP zMeul. I will miss you and I think the mods are doing a very big mistake. Basically, this shows that you can bully a member until they get banned, and I am sure the massive AMD brigade on this forum will utilize this as much as possible.

I'm pretty sure it was because he was openly confrontational with the mods about a post closure, instead of handling it through PM.  The aforementioned thread has been hidden, however.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Derangel said:

AMD has confirmed that they have been able to replicate the fault in house and they are working with people believed to be affected.

It seems that this issue may be fixed soon.

1 hour ago, LAwLz said:

He has done so several times. He has posted about several Intel and Nvidia problems in the past, but people seem to forget about those posts since this forum is a really big echo chamber about how bad zMeul is. He has done several constrictive posts. This thread is an example of one such instance.

He says which program is affected, and on which platform. He mentions which programs are not affected. He explains what the issue is, and links multiple sources. He then goes on to say that AMD has not identified the issue yet. No bullshit, just facts. Then people start attacking him as soon as they realize the issue is real and there isn't really any way to spin it (but people sure did try).

I'm not sure if posts has been deleted in this thread but seriously, go and read the first page and you will see exactly what I mean.

 

(That's not to say he isn't biased, because even he himself will say he hates AMD).

 

 

 

Just noticed that he is banned. RIP zMeul. I will miss you and I think the mods are doing a very big mistake. Basically, this shows that you can bully a member until they get banned, and I am sure the massive AMD brigade on this forum will utilize this as much as possible.

I have only been on the LTT forums for a short time, but it's obvious that zMeul dislikes AMD.

I honestly don't mind, people are titled to their opinions, and even though I don't take his stance with AMD, I fully support him being able to say what he feels.

That is, as long as he stays civil, of course.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Locked.

 

It seems everyone is using this thread to now discuss @zMeul.

 

As a reminder, we don't suspend someone for the fun of it, most of you will never see what happens behind the scenes and that's ok, but suspensions (permanent or temporary) are done for a reason. There are also a few places where the mods and admin can discuss all this, and final decisions are always a team effort.

 

With that said, we also don't allow for discussions about moderation on a specific incident or user, as per the Community Standards;

Quote

Moderation & Bans

  • Do not openly discuss the moderation of any content or user. If you have an issue, please contact a staff member.
  • Do not backseat moderate – if there’s an issue, please use the report function.
  • Please be aware staff cannot see private messages unless they are reported.
  • If you have an issue with a moderator at any time, please contact an Administrator via PM @Slick  @Whaler_99  @Windspeed36

 

If you need help with your forum account, please use the Forum Support form !

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.

×