Random NVMe error logs

rakinar2 · April 3, 2025

Hi there,

These are my PC specs:

CPU: i9-14900K

Board: MSI PRO Z790-A MAX WIFI

Storage: Kingston Fury Renegade 1TB PCIe 4 SSD

OS: Ubuntu 24.10 on Linux 6.11.0-21-generic

The problem is, when I check NVMe error logs using the `nvme-cli` tool on Linux, I see a bunch of errors:

$ sudo nvme error-log /dev/nvme1

Error Log Entries for device:nvme1 entries:63
.................
Entry[ 0]
.................
error_count    : 73484
sqid        : 0
cmdid        : 0x600a
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag    : 0
parm_err_loc    : 0x102c
lba        : 0
nsid        : 0
vs        : 0
trtype        : The transport type is not indicated or the error is not transport related.
csi        : 0
opcode        : 0
cs        : 0
trtype_spec_info: 0
log_page_version: 0
.................

[ keeps going on ... ]

Entry[62]
.................
error_count    : 73422
sqid        : 0
cmdid        : 0x5000
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag    : 0
parm_err_loc    : 0x102c
lba        : 0
nsid        : 0
vs        : 0
trtype        : The transport type is not indicated or the error is not transport related.
csi        : 0
opcode        : 0
cs        : 0
trtype_spec_info: 0
log_page_version: 0
.................

All errors have the same status field and trtype. I'm concerned if my drive is failing. It's brand new. Also, other checks such as smart self check and smart logs seem to state the drive is fine, except the error logs are there. Do I need to worry?

Smart logs:

$ sudo nvme smart-log /dev/nvme1

Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning            : 0
temperature                : 123 °F (324 K)
available_spare                : 100%
available_spare_threshold        : 10%
percentage_used                : 0%
endurance group critical warning summary: 0
Data Units Read                : 977,240 (500.35 GB)
Data Units Written            : 1,109,948 (568.29 GB)
host_read_commands            : 12,065,526
host_write_commands            : 14,552,309
controller_busy_time            : 12
power_cycles                : 49
power_on_hours                : 43
unsafe_shutdowns            : 6
media_errors                : 0
num_err_log_entries            : 73,570
Warning Temperature Time        : 0
Critical Composite Temperature Time    : 0
Temperature Sensor 2            : 131 °F (328 K)
Thermal Management T1 Trans Count    : 0
Thermal Management T2 Trans Count    : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

Mumintroll · April 3, 2025

I also have a Kingston Fury Renegade, but 2tb.

In the SMART field 0F "Number of Error Information Log Entries" there is a small number growing every bootup.

I've tried to find info about it, but wherever I look everyone else with the same drive seem to have that number growing a little each bootup.

So I'm guessing just that model do some logging, error or no error.

My cheaper Kingston NV2 does not log.

rakinar2 · April 3, 2025

7 minutes ago, Mumintroll said:

I also have a Kingston Fury Renegade, but 2tb.

In the SMART field 0F "Number of Error Information Log Entries" there is a small number growing every bootup.

I've tried to find info about it, but wherever I look everyone else with the same drive seem to have that number growing a little each bootup.

So I'm guessing just that model do some logging, error or no error.

My cheaper Kingston NV2 does not log.

Very weird. This could be a firmware issue right? I emailed Kingston about it, let's see if they respond.

rakinar2 · April 3, 2025

8 minutes ago, Mumintroll said:

I also have a Kingston Fury Renegade, but 2tb.

In the SMART field 0F "Number of Error Information Log Entries" there is a small number growing every bootup.

I've tried to find info about it, but wherever I look everyone else with the same drive seem to have that number growing a little each bootup.

So I'm guessing just that model do some logging, error or no error.

My cheaper Kingston NV2 does not log.

Also, to be precise, for me, the error log entries increase by 2 every 4-8 seconds, which is very strange.

Mumintroll · April 3, 2025

1 minute ago, rakinar2 said:

Also, to be precise, for me, the error log entries increase by 2 every 4-8 seconds, which is very strange.

Oh ok that's something new.

For me it's every bootup, and that's what I've found it doing for others aswell.

If you get an answer from Kingston let us know

robertoleonardo · April 15, 2025

On 4/3/2025 at 11:00 AM, rakinar2 said:

Also, to be precise, for me, the error log entries increase by 2 every 4-8 seconds, which is very strange.

FYI - i just got a new renegade drive for proxmox, and i'm seeing the exact same thing -- increases by 2 every 4-8 seconds. was glad to see this post because all the others are years old or relate to the seemingly distinct issue of it incrementing only on bootup.

my ocd doesn't like it, but i think it's probably fine. i wish i could update the firmware but apparently kingston has no way of doing that on linux -- and i'm not about to take apart my pc to drop this drive in just to update it. i'm just going to forget about it and hope it's not an issue, ha.

but yes, if you have heard back from kingston -- would love to hear what they have to say

robertoleonardo · April 15, 2025

1 hour ago, robertoleonardo said:

FYI - i just got a new renegade drive for proxmox, and i'm seeing the exact same thing -- increases by 2 every 4-8 seconds. was glad to see this post because all the others are years old or relate to the seemingly distinct issue of it incrementing only on bootup.

my ocd doesn't like it, but i think it's probably fine. i wish i could update the firmware but apparently kingston has no way of doing that on linux -- and i'm not about to take apart my pc to drop this drive in just to update it. i'm just going to forget about it and hope it's not an issue, ha.

but yes, if you have heard back from kingston -- would love to hear what they have to say

i've been digging around further -- for me, it looks like it has to do with my various monitoring utilities (telegraf+lm-sensors+glances for home assistant). i've been able to cut the errors way down by excluding a few sensors that the kingston drive doesn't have but for whatever reason lm-sensors was pinging every few seconds.

i'm still getting a new error or two every minute or so -- something to do with my glances configuration i think. still working on sorting out exactly the issue there...

Townsmcp · April 25, 2025

I am pleased I stumbled on this thread. I am seeing the exact same error messages in Promox on a 2tb Fury Renegade when I run `nvme error-log /dev/nvme0` on a new drive that I installed a couple of days ago. Also, when running `sensors` I see the following errors which seem to be normal for this drive:

nvme-pci-0200
Adapter: PCI adapter
Composite:    +27.9°C  (low  = -20.1°C, high = +83.8°C)
                       (crit = +88.8°C)
ERROR: Can't get value of subfeature temp3_min: I/O error
ERROR: Can't get value of subfeature temp3_max: I/O error
Sensor 2:     +63.9°C  (low  =  +0.0°C, high =  +0.0°C)

When I run `smartctl -a /dev/nvme0` the `Error Information Log Entries` just keeps growing - in a couple of days I am up to 1,372. However, if I run Glances (which I have installed in a docker container), the error number goes up by 2 every couple of seconds. When I close the Glances web page, I get the error increment a LOT slower.

@robertoleonardo you say it seems to be various monitoring utils that might be causing this. And that you have been able to cut down the errors by excluding a few sensors that the drive doesnt have - can you explain what you did/mean by that? I was on the verge of sending the drive back to Amazon but it seems like it will be fine to keep, just got to somehow reduce the errors.

Townsmcp · April 25, 2025

Just to update/add what I have done so far:

With Glances shut down and the excluding the temp2 sensor from lm-sensors my `Error Information Log Entries` no longer increases.

I have created a custom config file for lm-sensors so that it does not poll sensor2 anymore:

Identify your drive by:
- ```
sensors
```
  - Make a note of the drive name (for me this was nvme-pci-0200)
- ```
cd /sys/class/hwmon/
```
- ```
ls -l
```
- ```
cd
```
  into the appropriate directory (for me this is hwmon3)
- ```
cat temp1_label
```
  - this should be the Composite entity
- ```
cat temp2_label
```
  - this should be the offending sensor (Sensor 2)
- ```
cd /etc/sensors.d/
```
- ```
nano kingston_fury_nvme.conf
```
- Paste the following code - this will then stop lm-sensors polling the sensor (note the 4 spaces preceeding the `ignore`
  - ```
  chip "nvme-pci-0200"
      ignore temp3
```
- ```
systemctl restart lm-sensors
```
- ```
smartctl -a /dev/nvme0
```
  and observe no more increase in error log

I will be back to update this when I have figured out a way of stopping Glances also polling the temp2 sensors

Sign In

Random NVMe error logs

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI