Jump to content

WelshFruitSnacks

Member
  • Posts

    11
  • Joined

  • Last visited

Awards

This user doesn't have any awards

WelshFruitSnacks's Achievements

  1. I'm putting some thoughts I had out to see if someone else has a different opinion, and I'm assuming anyone reading this already understands how chips are made + terms. Its readily apparent that to advance to further nodes some more exotic technologies are going to be put into practice. 22nm by Intel saw the commercial introduction of FinFets, which was then adopted by TMSC and Samsung for 16/14nm nodes. Samsung is planning for GAA for 3nm, TMSC and intel to follow on 2nm/20A. Thats just ~6 nodes before moving into a new transistor geometry vs the 10's of nodes on planer geometries. Advancements in material science, lithography, mask creation and other fields are slowing down. Brute forcing density is quickly becoming un-economical, transistors are more expensive per $ at advanced nodes vs previous nodes which is a reversal of decades of cheaper transistors every node advancement. Planar-> FinFET -> GAA is a pretty natural procession as you increase gate control over the transistor channel in more dimensions. What comes after? Most speculation shows packaging is the area the semi majors will and are going to innovate in. GAA refinement will fizzle out in 3-4 nodes and switching to a new semiconductor currently not in the cards as far as the industry can see. Most other semiconductors aside from silicon are either, too expensive or not as balanced as silicon in mobilities (GaN has poor n-type mobility but superior p-type) in making CMOS (complementary) logic circuits. Logic families that can work off of non-CMOS semiconductors are less energy efficient (dynamo logic). Solutions to these problems aren't anywhere near commercial viability vs what silicon would be able to do once we start needing a better semiconductor. Lastly other semiconductors are hard to create large enough wafers, cheaply, for existing equipment. What kind of packaging will the semi majors look into? We already see the rapid expansion of multi die System on Interposers, AMD with chiplets, Intel with tiles, Apple came out with the M1 Ultra which is bonding 2 M1 Max's, similar to Intel tiles. This paradigm would see advancements in interposer assembly for more wires in the interposers so more functions can be communicated across the interposer. AMD is also looking at die stacking with VCache, but thermal limitations are a problem. Heat escapes out of the bulk side of a die and bonding two dies together means one of the dies has to face toward the motherboard/PCB/etc. So, the motherboard facing die probably has to be minimal heat generating or systems need to be redesigned so thermal systems can mount heatsinks to both sides of a pcb. Die stacking also reduces yields since two passing dies become failed dies if you mess up bonding the two dies because of pad misalignment. And afaik aligning pads is an issue so pads can't be too small. Which also limits i/o but for some applications you can get away with it. Something no fab company has taken a stab at yet is borrowing from NAND flash processes to create multi transistor layer dies. But while this works for NAND flash, it has many issues for logic circuits. For one you can't grow good mono-crystalline silicon on top of the mess of metal and dielectric that makes up the wiring of a silicon die. Two, heat issues, that transistor layer will be surrounded by insulators and will quickly hit 100+ C if it's too dense. Now you might say GAA processes need floating silicon channels so they must need to grow silicon to do that. However, what probably happens is good mono-crystalline floating channels are cut out of the bulk silicon layer vs being grown. Overall, I see in the next 2-3 decades at earliest we'll run out of major ways to pack transistors more densely, with favorable economics at the package level and the nature of computing will shift to more ASICs or DSICs (Domain specific) products. Or we throw out the modular systems we have today, and everything becomes an SOC or SO (system on interposer). So, upgradability and replaceability will have to be pushed aside if you want more performant/efficient computing for a given cost. A very unlikely option is Fabs start to mass implement ion beam technology for commercial manufacturing, slower but if the machines get cheap enough it could workout. Ion-beam tech is used for research purposes to prototype advanced transistor geometries but is unsuitable for mass wafer level manufacturing atleast for now...
  2. The only way AMD or Intel, and/or Nvidia would compete with Apple at the monolithic energy efficient CPU+GPU+accelerator SoC game is if they unified on chiplets and systems wouldn't be configurable anymore you would just buy the chiplet combinations on an interposer and throw it into a system with some DRAM. Also, I wouldn't say Apple is really that much better than the hardware majors, AMD, Intel, Nvidia in hardware design wins. GA102 GPU is 628.4mm sq and the 3090 costs 1500$ for something that doesn't have all the cores working on a more mature process. M1 Ultra is 840mm sq at 5nm, half the new Mac Studio's BOM cost must be the die itself since yields must not be that high at that size. Apple doesn't need a huge profit margin on its hardware since they have a software ecosystem whereas the hardware majors have to make profits off the hardware itself, for the most part. So, Apple can afford to throw area at designs unlike the hardware majors. Plus, with their ecosystem lock they can easily get consumers to justify the high price tags. Additionally, if you can justify wider more parallel architectures for area cost/ yield at the same performance, then you can improve efficiency by running at lower speeds which means lower voltages which translates to less energy for the same operations. Power = activity * Cap * Voltage ^2 * freq, Voltage effect is quadratic so lower voltages are much more efficient to run at. Apple's massive area budgets are partially a result of them trying to create chips that fit into very thermally limited designs, so they design wider cores run a bit slower for similar if not better performance. But they have different economics that allow them to make such huge chips. AMD/Intel/Nvidia don't care about your power bill, so they are very happy to reduce area and run faster at higher voltages vs M1 to optimize perf vs area/yield since once they make the sale there isn't a profit incentive on for them to care about power on your end. 350W 3090 is an example... This is why datacenter CPUs run slower, power goes into datacenter TCO, so they are designed with bit larger area, more cores, slower clocks, lower voltage. The only power restrictions for Intel/AMD/Nvidia are the system integrators have limits.
  3. Lets look at how much the US ban can actually hurt huawei aside from just the fab side. On the chip development side all three major EDA companies, Synopsys, Cadence, Mentor Graphics are American. China needs to develop and validate decades of software tooling in if/when Huawei loses access to those tools. Which isn't easy since those tools are still buggy (personal experience). Plus those guys are only there because they had to beat/acquired everyone else in a competitive market. China would probably only have a single company working on the replacement, reducing their potential for success (equal quality). Unless they simulated a competitive environment at an increased cost of developing multiple versions for a single winner. China would possibly lose access to ARM architectures if ARM ( a British firm) denies them access. While an ISA is "relatively" easy to come up with, the compilers, software, and operating systems are all issues to be resolved. This would also force Chinese technology to be incompatible with the rest of the world. Like what if all the standards organizations just stop working with China. (VESA,IEEE and other Tech Consortium) Just in terms of pure work that needs to be done even if you have all the manpower and know-how in the world it would take a decade to catch up. Now back to the main argument on fab tooling. China without the data that Western Countries have on individual pieces of tech: Lenses, Wafer manufacture, dielectrics, air-gap tech... Would take ages to calibrate any machines they could make. Not to mention even having that data isn't as useful as you might think because a lot of the calibration depends on the specific environmental factors that they are operating in. The same process in Korea and the US can have wildly different yield rates because of the turning for moisture, dust, specific air composition, local gravity, EM background noise... Sure China has the expertise, and resources. Maybe even enough corporate espionage to get these the base products made in a couple of years. They still have to take the time to account and figure out all the local variables to get the fabs to actually produce decent results when they actually start producing chips. Sure you can make the things in a small scale lab but the variability at larger scales will kill your commercial viability. We've been able to "make" smaller scale transistors for years with ion deposition on 4in research wafers. Getting these things to work properly with deposition equipment is a different ball game. Just research itself isn't useful , pretty much everyone has access to those papers you pointed out (academic research), but why are there so few players in this space? If it was commercially viable you would expect to see multiple companies on it. Even Carl Zeiss's competitors and consumers ,who have the most to gain to replicate the optics, can't do it. China can't just throw research experts and money at the problem and expect to quickly produce the same result. Also, even if the government has unlimited money they can't exactly have a 400 million dollar cost per item if western companies can do it for 100 million. They still need be in the same ball park for cost reasons. Not to mention that if there is any evidence of corporate espionage or patent theft they wouldn't even be able to sell products outside of China. Limiting the market to purely domestic sale/uses. Huawei would still be fucked.
  4. Well even the 180nm is in production for most major foundries since its a popular node for analog circuits which don't really like worrying about sub-micron effects of smaller nodes. Not everything needs to run 1GHz + which is why its a popular choice for micro controllers some of which might only run at 32KHz possibly for battery powered sensor controllers. If you want you still can go 1GHz+ with a super deep pipeline like the Pentium III which shows that its good enough for accelerators. 130nm is relativity cheap and has high yields making it good for academic/custom ASIC workloads. As far as I can tell a shuttle at 16mm^2 would be 70K for 100 chips which is about 700$ per chip on a single shuttle which isn't beyond the realm of possibility for a small team to invest in asics (remember a quadro is 5K). Now you would need to pay at least 5 people to design it from the ground up or 3 people to implement an open source design, but < 2K per chip isn't bad. Especially since you can also go and make a full production run of it for super low cost once you know it works. It could also be used by academics so they can save some grant money when doing proof of concept devices. Custom memory controllers for a flash array to handle in memory compute would be an interesting use case. The smallest node used for flash is the 16nm node for Samsung and SK hynix everyone else on flash is still at 20ish. Only top of the line logic cores like fpgas, GPUs (RTX was on 14nm/12nm FinFET) CPU's are prevalent in <20nm. You don't redesign products that work for smaller nodes due to lots of reasons, mostly economic, but also for things like redundant computing which requires low failure rates and/or longevity. Smaller nodes could have possible longevity issues that we just don't know yet.
  5. Depends on the program. Single threaded stuff is actually hurt by multi threading sometimes. Its more if you have two unrelated programs so your (out of order) CPU core can keep executing code even if one thread has a cache miss or a long dependency chain to resolve. There was a (in order) processor had 8 threads in 1 core but each thread was only running at 1/8th the speed of the CPU because each thread would occupy a separate pipeline stage and exit before entering. So your multi threaded workloads don't see dependencies and stall so you essentially have a 100% performance core, but the single threaded stuff could run at most at 1/8th the theoretical performance.
  6. Idk about cooling but you are more resistant to radiation bit flipping that way if you make sure the ceramic itself has less radioactive materials.
  7. Summary Google is trying to open up the chip design market by partnering with Skywater Technology Foundry (Formerly owned by Cypress Semiconductor) to release their 130nm PDK on github. Other contributors include OSU and Efabless. This includes STD Cell libraries, a RAM compiler, DRC's, and additional tooling needed to take RTL to the GDS level. The primary PnR(Place and Route) tool is Magic and they are trying to tape out Strive RISC V Soc's with this kit as a test. Don't remember what they are planning for low level synthesis. Currently Analog Cells+Primitives are unavailable. No PLL's, no ADC, DAC. I guess you could see if the RAM compiler can spit out sense amps you can create a ADC with. In the YouTube live-stream Tim Ansell announced Google would be sponsoring a shuttle in November and maybe more in the future. Designs can be up to 10mm^2 and each shuttle will include 40 designs. The designs will be taped out and packaged at no cost. But must be willing to publish everything from RTL to GDS to github with an open source liscene Quotes My thoughts As a Physical Design Engineer I think this is going to be fun if it takes off since I can learn about other parts of my work independently. Improvements to the STD Cell library, working on Timing and correlation, lots of possibilities. Someone might just want to submit a design full of ring oscillators for each STD cell to correlate the Library with the data. For Software focused engineers you can now start thinking about ASICs for your own workloads. Sources https://github.com/google/skywater-pdk https://hackaday.com/2020/06/25/creating-a-custom-asic-with-the-first-open-source-pdk/ https://antmicro.com/blog/2020/06/skywater-open-source-pdk/
  8. Exactly, ARM isn't really that RISC since it has 200+ instructions now. x86-64 isn't true CISC since they break down the most complex instructions into multiple smaller ones internally. But one benefit for Intel is Memory bandwidth since CISC instructions are less bytes for the same workloads. When pipelines were short you traded off how much you could do per instruction with clock speed but with deeper pipelines (out of order execution) you no longer have to trade off how much work you do per cycle since you are now limited by how well you can branch predict and bring in "OP's". So with shorter instructions containing more "OPs" x86 wins out in this regard on a per core basis. Now you do trade off chip area and power in the decode stages but you are on a desktop/higher power platform with deeper pocket customers so who cares. Apple can't force the rest of the world to use ARM and stuff that is already optimized for x86 isn't going to be moved to ARM if doesn't need to be. Just like how the financial industry isn't going to rewrite FORTRAN or COBOL into python ,or whatever the hottest new language is. To answer the title of the video: "Is Apple's Betrayal the END of Intel?" Unless you can get strong ARM support outside of the Apple ecosystem where there are multiple entities controlling all the aspects of computing from the software to the hardware you aren't going to kill Intel because compatibility is why x86 has been around so long anyways. Synopsys or Cadence aren't going to recompile their programs for ARM unless customers have ARM machines. Customers that aren't in the apple ecosystem aren't going to have ARM machines unless there is robust software for it. And unless there is robust software for the ARM machines then few people are going to make Consumer ARM platforms. So without a monolithic entity to force it on devs its not going to kill Intel any time soon.
  9. As far as the history of this chip goes it should be based on a AMD laptop GPU of some sort that also uses RDNA cut down to mobile specs. Something to be worried about is the power consumption of this chip. Most mobile chips support inter frame of power rendering which means they are only on for a fraction of a frame to render. This is a huge power saving requirement, but because its coming from a laptop chip which doesn't really need to worry about turn on time so adding it as a feature could be problematic, this could be a much better part performance wise except kill batteries pretty quickly. I would probably be inclined to take these numbers with a grain of salt considering the last time Samsung tried to make a mobile gpu it was a god damn disaster. But the project is set for at least 5 years so there isn't a reason to be falsifying these numbers.
  10. So this is "probably true", not exactly sure which devices are going to be affected but possible s12+ devices will no longer show a full custom CPU but rather an ARM hardened one. Other companies have already heard and are trying to hire talent, like Microsoft so maybe they'll try developing their own SOC. Maybe for the Surface Pro X that is currently using a Qualcomm device. https://www.extremetech.com/mobile/299438-cpu-layoffs-samsung-semiconductor Article references a single canceled project which was the main project at the center so "maybe" 60% of the center will be laid off, only affects CPU division. Hardening will probably be done in Korea but no one knows at this point. "Technically" no one is laid off yet. Related to this: Both the (not anymore) custom CPU and the (now RDNA) GPU are being developed there. Samsung is probably going the Qualcomm route since Qualcomm also does ARM hardening instead of custom CPU. Personally I feel if they were handicapped by Samsung's process which is somewhat inferior to TMSC but that is hard to actually get concrete numbers to show
×