Jump to content

PC shuts off under load it seems. Kernel-power 41.

Draysh

My pc has started shutting off seemingly under load.

I have no idea how to translate this wall of text into what the problem actually is. My current guess is the psu, but if one you knows what the issue is, please let me know.

Thanks in advance!

 

Lognavn:       System
Kilde:         Microsoft-Windows-Kernel-Power
Dato:          12-12-2023 15:02:32
Hændelses-id:  41
Opgavekategori:(63)
Niveau:        Kritisk
Nøgleord:      (70368744177664),(2)
Bruger:        SYSTEM
Computer:      Draysh
Beskrivelse:
Systemet har genstartet uden først at lukke korrekt ned. Denne fejl kan skyldes, at systemet er holdt op med at svare, er gået ned, eller at strømmen forsvandt uventet.
Hændelses-Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
    <EventID>41</EventID>
    <Version>9</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000400000000002</Keywords>
    <TimeCreated SystemTime="2023-12-12T14:02:32.6649960Z" />
    <EventRecordID>150934</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>Draysh</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">80</Data>
    <Data Name="BugcheckParameter1">0xfffff906ad24c372</Data>
    <Data Name="BugcheckParameter2">0x10</Data>
    <Data Name="BugcheckParameter3">0xfffff906ad24c372</Data>
    <Data Name="BugcheckParameter4">0x2</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
    <Data Name="Checkpoint">0</Data>
    <Data Name="ConnectedStandbyInProgress">false</Data>
    <Data Name="SystemSleepTransitionsToOn">0</Data>
    <Data Name="CsEntryScenarioInstanceId">0</Data>
    <Data Name="BugcheckInfoFromEFI">false</Data>
    <Data Name="CheckpointStatus">0</Data>
    <Data Name="CsEntryScenarioInstanceIdV2">0</Data>
    <Data Name="LongPowerButtonPressDetected">false</Data>
    <Data Name="LidReliability">false</Data>
    <Data Name="InputSuppressionState">0</Data>
    <Data Name="PowerButtonSuppressionState">0</Data>
    <Data Name="LidState">3</Data>
  </EventData>
</Event>

Link to comment
Share on other sites

Link to post
Share on other sites

Update:

Seems to be related to a module named amdppm. Not really sure what this means, but does not seem to be a psu issue?

SYMBOL_NAME:  amdppm!C1Idle+11

MODULE_NAME: amdppm

IMAGE_NAME:  amdppm.sys

IMAGE_VERSION:  10.0.23424.1000

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  11

FAILURE_BUCKET_ID:  AV_X_(null)_BAD_IP_amdppm!C1Idle

OS_VERSION:  10.0.22621.1

BUILDLAB_STR:  ni_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {df263a6a-9f2c-df2d-c5c7-6325390ff21b}

Followup:     MachineOwner
---------

Link to comment
Share on other sites

Link to post
Share on other sites

That error message just tells you, that your PC restarted without properly shutting down. That can have a number of causes.

If you PC shuts down, if you increase the load, the PSU is a likely culprit, but it could be other components as well.

 

Can you reliably trigger this? If you run a stress test on just the CPU or just the GPU, does your PC also shut down or does it only happen if you run both together?
Are there maybe some other logs in the event viewer? Something before your PC restarted would be great.

 

Can you give a detailed list of your hardware? RTX 3000 series cards are notorious for tripping the overcurrent protection on your PSU, even when it's a good model and in spec.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, adm0n said:

That error message just tells you, that your PC restarted without properly shutting down. That can have a number of causes.

If you PC shuts down, if you increase the load, the PSU is a likely culprit, but it could be other components as well.

 

Can you reliably trigger this? If you run a stress test on just the CPU or just the GPU, does your PC also shut down or does it only happen if you run both together?
Are there maybe some other logs in the event viewer? Something before your PC restarted would be great.

 

Can you give a detailed list of your hardware? RTX 3000 series cards are notorious for tripping the overcurrent protection on your PSU, even when it's a good model and in spec.

Hardware:

Rtx 3070

Ryzen 5600x

16 gb ram, 3600 Mhz

 

There are no other "critical" logs in the event viewer, and i'm not really sure how to see them in cronologial order so to say.

 

It has happened twice now from playing warframe for about 20 minutes. It does not happen when not playing a game.

Is there a way i could isolate CPU and GPU to test which might be the culprit?

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Draysh said:

Seems to be related to a module named amdppm.

AMD Processor Power Management driver, module related to power management. This error and Error ID 41 can be caused by many things.

Let's not get laser-focused on the PSU for now.

 

Like @adm0n said, let's try first to make it crash consistently.

Try running a memory test and a CPU stress test like Prime95 to help narrow down the issue.

It would be useful if you could include more memory dumps too.

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

To add to @Yuas answer, you can use Prime95 to stress your CPU and Furmark to stress your GPU.

 

Important here is though, that these stress test do not provide a real life use case. So things like the transient spikes from your GPU are less likely, as far as I know.

If the issue doesn't pop up immediately after you start a game, it's a bit harder to troubleshoot sadly.

 

What PSU model you have would also be good to know.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Yua said:

AMD Processor Power Management driver, module related to power management. This error and Error ID 41 can be caused by many things.

Let's not get laser-focused on the PSU for now.

 

Like @adm0n said, let's try first to make it crash consistently.

Try running a memory test and a CPU stress test like Prime95 to help narrow down the issue.

It would be useful if you could include more memory dumps too.

I ran the Prime95 test, and it seems my pc blue screened at the excact time the test finished so i think you're on to something 🙂

Do i have anything from this test i could send to help you finding the problem?

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, adm0n said:

To add to @Yuas answer, you can use Prime95 to stress your CPU and Furmark to stress your GPU.

 

Important here is though, that these stress test do not provide a real life use case. So things like the transient spikes from your GPU are less likely, as far as I know.

If the issue doesn't pop up immediately after you start a game, it's a bit harder to troubleshoot sadly.

 

What PSU model you have would also be good to know.

My psu is 100% evga

I'm not entirely sure on the model other than it is 750w and modular.

I believe it could be the EVGA SuperNOVA 750 GQ, as the box seems familiar.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Draysh said:

My psu is 100% evga

I'm not entirely sure on the model other than it is 750w and modular.

I believe it could be the EVGA SuperNOVA 750 GQ, as the box seems familiar.

Well for now you can hunt the CPU problem. As @Yua said:

16 minutes ago, Yua said:

It would be useful if you could include more memory dumps too.

If your PC crashed in a way that allows for one to be created you should either have smaller file at: C:\Windows\Minidump

or a bigger one called Memory.dmp

 

The smaller one would be helpful. But if it is a power or CPU problem, the PC might stop working, before it has the possibility to create these files.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Draysh said:

my pc blue screened at the excact time the test finished

What error did you get?

 

4 minutes ago, Draysh said:

EVGA SuperNOVA 750 GQ

If working properly, this PSU shouldn't have any issues running your hardware.

 

Does running the test again crash your PC?

Did a new entry pop up in the event viewer?

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Yua said:

What error did you get?

 

If working properly, this PSU shouldn't have any issues running your hardware.

 

Does running the test again crash your PC?

Did a new entry pop up in the event viewer?

I will try running the test again and see if it happens.

For now here are all the entries and dumps i could find.

 

Error from prime95:

[Tue Dec 12 15:43:31 2023]
FATAL ERROR: Rounding was 0.5, expected less than 0.4

 

Entry:

Lognavn:       System
Kilde:         Microsoft-Windows-Kernel-Power
Dato:          12-12-2023 15:44:01
Hændelses-id:  41
Opgavekategori:(63)
Niveau:        Kritisk
Nøgleord:      (70368744177664),(2)
Bruger:        SYSTEM
Computer:      Draysh
Beskrivelse:
Systemet har genstartet uden først at lukke korrekt ned. Denne fejl kan skyldes, at systemet er holdt op med at svare, er gået ned, eller at strømmen forsvandt uventet.
Hændelses-Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
    <EventID>41</EventID>
    <Version>9</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000400000000002</Keywords>
    <TimeCreated SystemTime="2023-12-12T14:44:01.5012424Z" />
    <EventRecordID>151068</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>Draysh</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">80</Data>
    <Data Name="BugcheckParameter1">0xffffc004748288d0</Data>
    <Data Name="BugcheckParameter2">0x0</Data>
    <Data Name="BugcheckParameter3">0xfffff807669b26c9</Data>
    <Data Name="BugcheckParameter4">0x2</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
    <Data Name="Checkpoint">0</Data>
    <Data Name="ConnectedStandbyInProgress">false</Data>
    <Data Name="SystemSleepTransitionsToOn">0</Data>
    <Data Name="CsEntryScenarioInstanceId">0</Data>
    <Data Name="BugcheckInfoFromEFI">false</Data>
    <Data Name="CheckpointStatus">0</Data>
    <Data Name="CsEntryScenarioInstanceIdV2">0</Data>
    <Data Name="LongPowerButtonPressDetected">false</Data>
    <Data Name="LidReliability">false</Data>
    <Data Name="InputSuppressionState">0</Data>
    <Data Name="PowerButtonSuppressionState">0</Data>
    <Data Name="LidState">3</Data>
  </EventData>
</Event>

 

Minidump:

5: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffc004748288d0, memory referenced.
Arg2: 0000000000000000, X64: bit 0 set if the fault was due to a not-present PTE.
	bit 1 is set if the fault was due to a write, clear if a read.
	bit 3 is set if the processor decided the fault was due to a corrupted PTE.
	bit 4 is set if the fault was due to attempted execute of a no-execute PTE.
	- ARM64: bit 1 is set if the fault was due to a write, clear if a read.
	bit 3 is set if the fault was due to attempted execute of a no-execute PTE.
Arg3: fffff807669b26c9, If non-zero, the instruction address which referenced the bad memory
	address.
Arg4: 0000000000000002, (reserved)

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : AV.Type
    Value: Read

    Key  : Analysis.CPU.mSec
    Value: 2343

    Key  : Analysis.Elapsed.mSec
    Value: 6864

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 0

    Key  : Analysis.IO.Write.Mb
    Value: 0

    Key  : Analysis.Init.CPU.mSec
    Value: 421

    Key  : Analysis.Init.Elapsed.mSec
    Value: 3891

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 97

    Key  : Bugcheck.Code.LegacyAPI
    Value: 0x50

    Key  : Failure.Bucket
    Value: AV_R_(null)_nt!RtlCompressBufferXpressLzStandard

    Key  : Failure.Hash
    Value: {2eaf1604-dd87-7d64-b274-d3b552f8b4b3}

    Key  : WER.OS.Branch
    Value: ni_release

    Key  : WER.OS.Version
    Value: 10.0.22621.1


BUGCHECK_CODE:  50

BUGCHECK_P1: ffffc004748288d0

BUGCHECK_P2: 0

BUGCHECK_P3: fffff807669b26c9

BUGCHECK_P4: 2

FILE_IN_CAB:  121223-15593-01.dmp

READ_ADDRESS: fffff8076731c468: Unable to get MiVisibleState
Unable to get NonPagedPoolStart
Unable to get NonPagedPoolEnd
Unable to get PagedPoolStart
Unable to get PagedPoolEnd
unable to get nt!MmSpecialPagesInUse
 ffffc004748288d0 

MM_INTERNAL_CODE:  2

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  MemCompression

TRAP_FRAME:  ffffd189ab8bf7a0 -- (.trap 0xffffd189ab8bf7a0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000000002da rbx=0000000000000000 rcx=0000000000000041
rdx=ffffc00425303230 rsi=0000000000000000 rdi=0000000000000000
rip=fffff807669b26c9 rsp=ffffd189ab8bf930 rbp=ffffc00425b286c5
 r8=ffffc004748288d0  r9=ffffc00425312f29 r10=ffffc00425b286c5
r11=ffffc00425312f4f r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz ac pe nc
nt!RtlCompressBufferXpressLzStandard+0x139:
fffff807`669b26c9 413808          cmp     byte ptr [r8],cl ds:ffffc004`748288d0=??
Resetting default scope

STACK_TEXT:  
ffffd189`ab8bf578 fffff807`66a8152f     : 00000000`00000050 ffffc004`748288d0 00000000`00000000 ffffd189`ab8bf7a0 : nt!KeBugCheckEx
ffffd189`ab8bf580 fffff807`66853cac     : ffffd189`ab8bf978 00000000`00000000 ffffd189`ab8bf739 00000000`00000000 : nt!MiSystemFault+0x23079f
ffffd189`ab8bf680 fffff807`66a39529     : 00000000`0000000a fffff807`66bb13a9 ffffc004`25312f29 ffffc004`1f1b5000 : nt!MmAccessFault+0x29c
ffffd189`ab8bf7a0 fffff807`669b26c9     : ffffd189`ab8bf958 00000000`00000000 00000000`00000000 00000008`00000003 : nt!KiPageFault+0x369
ffffd189`ab8bf930 fffff807`669b2581     : ffffc004`25312f29 00000000`00000000 ffffc004`25b28070 ffffc004`25311f4f : nt!RtlCompressBufferXpressLzStandard+0x139
ffffd189`ab8bf9e0 fffff807`6697532f     : ffffc004`083e5570 fffff807`6687d05a ffffc004`255f9000 ffffc004`2511e930 : nt!RtlCompressBufferXpressLz+0x61
ffffd189`ab8bfa40 fffff807`66bacf8e     : ffffc004`083e5500 00000000`255f9070 ffffc004`255f9028 fffff807`66bad163 : nt!RtlCompressBuffer+0x6f
ffffd189`ab8bfaa0 fffff807`66aee69b     : ffffc004`083e5500 ffffc004`083e5010 ffffc004`083e5500 ffffc004`1bdbd1c0 : nt!SMKM_STORE_MGR<SM_TRAITS>::SmCompressCtxProcessEntry+0x92
ffffd189`ab8bfb30 fffff807`669089b7     : ffffffff`fd050f80 ffffc004`20243080 fffff807`669e6ee0 ffffc004`25303000 : nt!SMKM_STORE_MGR<SM_TRAITS>::SmCompressCtxWorkerThread+0x1077bb
ffffd189`ab8bfbb0 fffff807`66a2d854     : ffff9800`76e51180 ffffc004`20243080 fffff807`66908960 00000000`00000246 : nt!PspSystemThreadStartup+0x57
ffffd189`ab8bfc00 00000000`00000000     : ffffd189`ab8c0000 ffffd189`ab8b9000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34


SYMBOL_NAME:  nt!RtlCompressBufferXpressLzStandard+139

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

IMAGE_VERSION:  10.0.22621.1020

STACK_COMMAND:  .cxr; .ecxr ; kb

BUCKET_ID_FUNC_OFFSET:  139

FAILURE_BUCKET_ID:  AV_R_(null)_nt!RtlCompressBufferXpressLzStandard

OS_VERSION:  10.0.22621.1

BUILDLAB_STR:  ni_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {2eaf1604-dd87-7d64-b274-d3b552f8b4b3}

Followup:     MachineOwner
---------

Link to comment
Share on other sites

Link to post
Share on other sites

Did you run the memory test?

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Yua said:

Did you run the memory test?

Uhm i just ran the default test i think.

This time it seems to have had a "hardware failure" on 3 cores, but the other 3 cores seem to be finishing the test without crashing this time.

Results:

[Tue Dec 12 15:59:17 2023]
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
[Tue Dec 12 16:04:30 2023]
Self-test 960K passed!
Self-test 960K passed!
Self-test 960K passed!

 

How would i go about running the memory test?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Draysh said:

How would i go about running the memory test?

Just search in your Windows search bar for "Windows Memory Diagnostic" and run the test.

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Draysh said:

Uhm i just ran the default test i think.

This time it seems to have had a "hardware failure" on 3 cores, but the other 3 cores seem to be finishing the test without crashing this time.

Results:

[Tue Dec 12 15:59:17 2023]
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected running 960K FFT size, consult stress.txt file.
[Tue Dec 12 16:04:30 2023]
Self-test 960K passed!
Self-test 960K passed!
Self-test 960K passed!

 

How would i go about running the memory test?

For the memory test, the simplest version is just typing in Windows Memory Diagnostic in the windows search bar and run it.

 

Did you by any chance play around with AMD Precision Boost Overdrive to undervolt / overclock your CPU? Because that is usually the error you get if you have an unstable undervolt.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Yua said:

Just search in your Windows search bar for "Windows Memory Diagnostic" and run the test.

Oh yeah, i ran that yesterday and it reported no issues

 

1 minute ago, adm0n said:

For the memory test, the simplest version is just typing in Windows Memory Diagnostic in the windows search bar and run it.

 

Did you by any chance play around with AMD Precision Boost Overdrive to undervolt / overclock your CPU? Because that is usually the error you get if you have an unstable undervolt.

image.png.fb34c0c716c695b5097cc4e97a62055f.png

These are my settings. Should i just set it to default, off and see if i crash again?

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Draysh said:

Oh yeah, i ran that yesterday and it reported no issues

 

image.png.fb34c0c716c695b5097cc4e97a62055f.png

These are my settings. Should i just set it to default, off and see if i crash again?

Set it to default, yeah. I searched your Prime95 errors and seem to be OC related.

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Yua said:

Set it to default, yeah. I searched your Prime95 errors and seem to be OC related.

Okay, i have applied the default settings and will try to play for a while and see if it's fixed.

 

For now, thank you so much for your help @Yua @adm0n

Link to comment
Share on other sites

Link to post
Share on other sites

If removing the OC fixes it, or even if it doesn't, you can try giving the core 0.05v+ and rerunning the test to see if it becomes stable.

That way, you won't lose performance.

Qoute my reply if you want me to answer back. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×