Jump to content

brwainer

Member
  • Posts

    3,303
  • Joined

  • Last visited

Everything posted by brwainer

  1. In strict terms, the things you asked about are true. However, there are other factors that can mitigate these. The SDWAN appliances can do header compression/decompression of the headers within the traffic being tunneled, frequently canceling out the added header of the tunnel itself (at least the one we use can, not a general promise of feature availability) When using a known secure underlay such as MPLS, you can choose to send traffic without encryption (again, at least our SDWAN can) which means it just has a minimal header added to get to the other end the added latency of encryption and decryption is generally negligible compared to the latency of just getting to the other side (speed of light * distance latency)
  2. Also consider that a Chassis like that typically includes the drive backplane and often the PSU(s) as well, which are low-volume specialized parts as far as these things go. If you look at the cost breakdown of a Backblaze pod you’ll see the backplanes cost as much or more than the bent metal, and they designed fairly simple ones that just need to do SATA, not full SAS.
  3. I’m working at a Fortune 500 company as a Lead Network Engineer. A few weeks ago, myself and three of my colleagues flew out to a new datacenter to rack and set up $2 million worth of Cisco and F5 equipment. Nexus 9K switches in spine and leaf topology, with Catalyst 9300L switches for the out of band management, and Catalyst 8500 routers. Each one we performed initial setup on purely from serial console. All programming of them done via CLI after they got online too. The VXLAN config will be done via Nexus Fabric Manager, but that’s it and we’d be comfortable doing it by hand if we hadn’t been given it for free. We’re connecting console servers to provide OOB serial access as well. If we had gone with Arista or Juniper for this deployment, the overall methods would have been the same - I haven’t seen a GUI yet that is good enough to completely replace the speed at which you can get precise information out of a CLI. GUIs are good when you are taking a larger overall view at something, or to enable templates and standardized workflows. Except for when they try to teach their automation tools, anything you learn about Cisco will be transferable to general networking principals and other vendors. I had a networking class in college that was just using a CCNA Study Guide as the course material, and at my first job after that I mainly touched Aruba/HP and then later Ruckus/Brocade switches, and the knowledge I gained follow the Cisco methods was still useful to me. At my second job, we used hardly any Cisco equipment. And even so, when talking to my manager about what I should study and certifications to try to get to further my career both with that company and in general, I was told to continue on the Cisco certification path. At my current employer, when I applied for my first position here, the role was for removing Cisco routers from over 1000 branch locations and replacing them with a non-Cisco SDWAN appliance. And yet the fact that I was CCNA certified was a deciding factor between me and another candidate. Try to recognize in your studies what is an industry standard, such as protocols and RFCs that everyone has to abide by, and what is Cisco’s way of implementing things. Sometimes the way Cisco does things becomes the standard that everyone follows, and sometimes they go off on their own, and its only the requirement of interoperability that keeps things minimally compatible.
  4. Entirely depends on what you’re doing. Large files that are sequentially written and read? Yeah a single drive is going to sustain >1Gbps all day (right around 2.5Gbps at the outside of the disk where the linear velocity is fastest, slowing down to the middle). More small files (<1MB), or things read non-sequentially, like programs or video editing? Not a chance without a large array or SSDs.
  5. Unraid or SHR as mentioned above… one difference is that SHR works more like Drobo’s implementation, meaning you will get more than one drive’s speed most of the time, whereas Unraid will never be faster than a single disk at a time other than SSD caching.
  6. There is a middle ground of “DHCP Static” or “DHCP Reservations”. My method is this: anything that is essential for DHCP to run or to fix it gets a (true) static IP, everything else that would just be annoying if it changed gets a DHCP Static/Reservation. So for me, where I have DHCP being done by a VM and the router doing relay to it for the other VLANs, my static list is: - Router - main switch - Hypervisors - the VMs that do DHCP (HA pair) - desktop PC to be able to fix stuff Everything else is using DHCP and I just have reservations in the DHCP server. I find it simpler this way.
  7. While some features like using built-in VPN may different in performance based on CPU/SOC, the CPU chosen will almost always be sufficient to not be the bottleneck in the system - plain routing and NAT and statefull firewall at gigabit speeds is not hard.
  8. Did you add that email to your contacts list? In Gmail anything from a contact skips the spam filters.
  9. Pssst just making sure you’re aware of the BliKVM PCIe version - yes it costs money but $207USD (at this moment) is probably a lot cheaper than anything industrial/enterprise you found. It runs a fork of the PiKVM project software. You would use something like your smartphone instead of physical buttons, but you get the added benefit of being able to do this and also the KVM functionality away from home (if you set up a VPN to your home router) https://www.aliexpress.us/item/3256804386522898.html?gatewayAdapt=glo2usa4itemAdapt
  10. Agree with above posters so far. But also, it may be a bad motherboard slot, or bad CPU socket, or bad CPU. Try swapping the two sticks’ position - put the “good” stick in the “bad” stick’s slot, and leave the “bad” stick out. See whether you have memory issues - if yes, then its probably not the memory stick. Validate by putting the “bad” stick in the “good” slot and taking the “good” stick out again. In other words, you’re looking to see whether the issues move with the memory stick, or are tied to the slot. If the issue is tied to the slot, the only advice I have is to remove the CPU, check for bent pins, then replace the CPU and make sure the heatsink isn’t over-torqued.
  11. Sounds like it. Could be a bad solder joint on the sata port, or damage to the PCB.
  12. Harhar tin foil hat. but seriously if you actually want to “turn on and off” the wireless often and quickly, a shielded box that you can place over it is going to be effective. Faster to walk over and remove the box than to log into a page and change a setting. (Depending on the size of the house). Or as @Heats with Nvidia says, get your own router. You won’t get anywhere with most ISPs with the argument that other ISPs let customers do this because your ISP probably has no competition. Or else move to the competition.
  13. If you care about the data at all, you should have regular backups at all times.
  14. You can have the drive be a hot spare at the ZFS level, but not at the bootloader level - there will always be command line work in setting up the replacement drive to be bootable.
  15. They have 1 512GB SSD already and are thinking about buying 3 more.
  16. Setting up the partitions is handled by the installer when you select a ZFS boot type. For replacing a failed drive, the documentation steps direct you to use sgdisk to copy the partition table from a remaining good drive. Yes, the steps should be the same with a RAIDZ1. https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_zfs_administration -> Changing a failed bootable device Or https://pve.proxmox.com/wiki/ZFS_on_Linux#_zfs_administration - looks to have the same content
  17. So ZFS boot is a little bit of a lie…. Yes, your OS root will live on ZFS and be mirrored, but your motherboard doesn’t understand ZFS. What do we do? The answer is to install the bootloader and kernel to a separate partition that uses a conventional filesystem. This partition is set up the same way on every drive, and when an update happens (new kernel install) that gets done on every drive as well. Therefore, it is possible at the ZFS layer to assign that third drive as a hot spare, but if it ever gets used the boot partition won’t be automatically made and kept in sync. I would just keep the drive around, make sure email alerts from the system are working (SMART errors and ZFS-ZED alerts are automatically sent to the email address provided during install) and when needed follow the Proxmox documentation for how to replace a boot drive in a ZFS mirror.
  18. I wouldn’t expect your switch to help much on its own. The segmentation needs to happen at the router.
  19. Having different subnets means that devices won’t expect an IP outside of their subnet to be local, and therefore will use the default gateway instead. If there is a shared router between the subnets, and it doesn’t have firewall policies prevent the two from talking to each other, then it will happily route traffic between the subnets - that is in fact the main job of a true router, the way they were used originally. You mention “Guest Network” - some routers or APs will have this function, and it normally includes automatic firewall policies to let the devices on it talk to the internet but not anything local. Maybe something happened to the settings on your guest network? Did you need to allow it to reach a printer or something like that at some point? Finally, you mention subnets, but you didn’t mention VLANs. If you have two subnets, but don’t have VLANs, then those subnets are in the same “broadcast domain” - meaning that they will hear broadcast and multicast packets from each other. That may be a contributing factor to the behavior you are seeing. Hopefully this will help you determine what has changed. But I can promise you that the fundamentals of the subnet mask have not.
  20. Alternatively, have you been doing automatic snapshots? What’s the timeframe on those? In ZFS this is the difference between “Used” and “Referred” - Referred being the current data, and Used including snapshots.
  21. What model are the existing switches? Why not get more of that same model?
  22. 6430’s only do 1Gb SFP https://webresources.ruckuswireless.com/pdf/datasheets/ds-icx-6430-6450.pdf
  23. In the press release they say that the backup system was also affected at the same time, so I think they ran out of space on the underlying array and both VMs used the same storage. Which is obviously a single point of failure for their live DR, but we don’t know why their decisions were made.
  24. I don’t have insight into their system. But I do have insight into a system that is probably similar complexity, age, and annoying-ness. $dayjob has a custom-built application that was originally developed in the early 1980’s. It exclusively runs on Unix mainframes, currently HP-UX mainframes with Intel Itanium processors since the early 2000’s. We were one of the companies that bought into the hype of Itanium, but it was already a Unix Mainframe application so the porting from whatever it ran before to Itanium wasn’t hard, and HP themselves helped with the porting because we were an early customer. Two decades later and we are trying to port it to Linux/x64 but day to day production still relies on 8 HP-UX systems that take up 1/2 of a rack each. An entire datacenter is built around supporting them. Anyway, relevant to this story, for us the issue isn’t “disk space”. The HP-UX OS is capable of mounting iSCSI shares of any arbitrary size (I believe its been patched with ext4 support). The issue is that the system uses files with specially laid out metadata structures as databases. Technically all databases are files at the end of the day - have to structure the data on disk somehow. The difference is that this is some special type of database written in the early 2000’s that is tuned for fast processing by the Itanium CPUs and to be read directly between disk and RAM, and the data structure has to be written out in advance. Its like formatting a drive before you can use it, or if you’re old enough to know these things its like writing the sectors onto a HDD or FDD directly. Every time the system is down for maintenance, in addition to their other tasks, the Unix Admins run scripts to expand the database files as fast as the system can handle it - literally just writing out empty areas at the end of the existing database files for it to fill in with data later. If the system ever caught up with the prepared database area, it would crash and require emergency expansion. I suspect it is something like this when they say they ran out of disk space - and it sounds like their application didn’t just halt immediately but instead tried to keep running and they lost a bunch of data either that was trying to be added, or already stored on disk. The two days was probably the time it took to restore the most recent backup, and replay/rebuild as much data as they could. The decisions around making the system this way made sense at the time - there’s no use in complaining about decisions made two decades ago. But its hard to swap a diesel engine for the steam locomotive while the train is in motion. Edit: Went and read the actual article. This sounds like a more mundane issue than I thought - it literally ran out of disk space when they tried to update it, but when it did so it deleted some data. And they resolved it by recovering to a new server with more space. That’s just bad administration.
×