Posted January 26, 2021 I remember a long time ago LTT mentioned that they were trying to work with someone to develop a test to actually measure and detect when ECC is useful, seems like they never did that. So I was wondering if someone else might know if its possible to just run some loop and detect bit flips that don't get past the OS. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 26, 2021 25 minutes ago, Rugg said: So I was wondering if someone else might know if its possible to just run some loop and detect bit flips that don't get past the OS. If the ECC is working, generally if it's a 1-bit flip, it will get logged by the OS; if it's a 2+-bit flip, the system will crash. 25 minutes ago, Rugg said: develop a test to actually measure and detect when ECC is useful I might be mistaken, but I think the paid version of one of the Memtest programs actually has that ability. Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | a 10G NIC (pending) | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight) Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine) Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL) Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / TEAMGROUP MS30 1TB | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card Network: Spoiler ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐ Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch) (500Mbps↑/500Mbps↓) UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI) ╠═ Veda-NAS (HW Passthrough NIC) ╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC) ║ ┌────── Closet ──────┐ ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐ ╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant (PoE) │ ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle) │ ╚═ Jesta Cannon* │ ┌─────────────── Media Center ──────────────────────────────────┐ Notes: └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE) ═══ is Multi-Gigabit ├─ Sony Playstation 4 ─── is Gigabit ├─ Pioneer VSX-S520 * = cable passed to Bedroom from Media Center ├─ Sony XR65A80K (Google TV) ** = cable passed from Media Center to Bedroom └─ Work Laptop** (Startech USB-PD Dock) Retired/Other: Spoiler Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor) Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 26, 2021 Author 22 minutes ago, AbydosOne said: If the ECC is working, generally if it's a 1-bit flip, it will get logged by the OS; if it's a 2+-bit flip, the system will crash. And without ECC how easy should it be to cause a bit flip? I assume if I just make like a million integer array full of the number zero, they are all going to be zero, every time, or will it sometime not be zero? Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 27, 2021 4 hours ago, Rugg said: And without ECC how easy should it be to cause a bit flip? Pretty easy, see e.g. the Rowhammer-exploit. Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 27, 2021 9 hours ago, AbydosOne said: if it's a 2+-bit flip, the system will crash. Just out of curiority (the wikipedia page doesn't mention this), is this because ECC ram detects the error and shuts down the system? Bit flips on their own may or may not cause a system crash Don't ask to ask, just ask... please sudo chmod -R 000 /* Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 27, 2021 Author After watching this video (super interesting btw, HerrSmatzeR#7054 put it in the ltt discord), it seems reasonable that if you just fill the ram up with data, its possible you can get a memory error and if you read it back you should get a different result. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 27, 2021 20 minutes ago, Sauron said: Just out of curiority (the wikipedia page doesn't mention this), is this because ECC ram detects the error and shuts down the system? Bit flips on their own may or may not cause a system crash It's technically not the RAM itself that shuts the system down, it's the OS kernel that halts the system in case there is an uncorrectable ECC-error. Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted January 27, 2021 1 hour ago, Sauron said: Just out of curiority (the wikipedia page doesn't mention this), is this because ECC ram detects the error and shuts down the system? Bit flips on their own may or may not cause a system crash On x86 it causes a machine check exception, for which the OS can install a handler. Windows will typically display a blue screen (which is still better then unknowingly working with possibly corrupted data). But on mission critical systems the OS/application could attempt recovery. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now