Traditionally the FP32 Cores is what they count when they mention the CUDA Cores, If you look at the A100 it has 6912 FP32 Cores and 3456 FP64 Cores, but since the A100 has 2xFP16 in each FP32 core and 4xFP16 in each FP64 core that means there is a total amount of 27648 FP16 Cores, according to NVIDIA the A100 achieves 78 TFlops of FP16 Performance at 1410Mhz, you can calculate the TFlops by multiplying the Core Count with Clock Speed then multiplied again by 2, which turns out to be 27648*2*1410 = 78 TFlops, now if I am correct that what they're counting are the FP16 Cores in the RTX 3080, then dividing the core count and then dividing it again to get the TFlops performance should get the same results as if we calculate the FP16 Performance by multiplying the cores with the Clock Speed:
Calculating again by multiplying:
8704*2*1410 = 24.54 TFlops (FP16 Performance at 1410Mhz)
*to note though the 30 TFlops they've mentioned was calculated at 1710Mhz which is the boost clock
The calculations all match up which means... the RTX 3080 actually has 2176 CUDA Cores in the traditional sense by taking account only the FP32 Cores... to calculate the FP32 Performance at 1710Mhz that would turn out to 2176*2*1710Mhz = 7.44 TFlops, I believe that's why they refused to mention the actual CUDA Cores since to most people it might seem like a downgrade but since they were able to achieve 2X FP32 IPC Improvement that puts the FP32 TFlops performance at 15 TFlops when compared to previous generation.
in realistic scenarios though the card would most likely boost way higher than 1710Mhz, and the IPC Improvement is likely slightly higher than 2X which would put it more inline with the performance metrics we've seen with the RTX 2080.