Jump to content

AMD faces class action suit over Bulldozer missrepresentation

zMeul

HSA brings many benefits, like power-effeciency, effective-throughput and latencies. Can be used from embedded, semi-custom, HPC, servers, desktop and mobile.

HSA got many mobile-players partners, so it might gather more attention there.

HSA still isn't used that much.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

HSA brings many benefits, like power-effeciency, effective-throughput and latencies. Can be used from embedded, semi-custom, HPC, servers, desktop and mobile.

HSA got many mobile-players partners, so it might gather more attention there.

yeah, from what i can gather, its mostly going to benefit mobile space, as CPU and GPUs can be interchanged between the chip designers, knowing memory controllers, GPUs, CPUs, decoders and whatnot will seamlessly intergrate irregardless of who made what.

Link to comment
Share on other sites

Link to post
Share on other sites

HSA still isn't used that much.

HSA still hasnt been put into actual large scale use... no really, the first HSA designs for mobile, that has actual practical applications, isnt due before next year or even later if there are delays.

 

Carrizo APUs are fully HSA compliant, and does feature ARM cores. However i am not sure how big a part HSA played in getting those ARM cores to work alongside the X86 CPU

Link to comment
Share on other sites

Link to post
Share on other sites

yeah, from what i can gather, its mostly going to benefit mobile space, as CPU and GPUs can be interchanged between the chip designers, knowing memory controllers, GPUs, CPUs, decoders and whatnot will seamlessly intergrate irregardless of who made what.

Yes, HSA is ISA-agnostic.

Looking into semi-custom, embedded markets and HPC, it might also have several benefits there.

 

 

HSA still hasnt been put into use... no really, the first HSA designs for mobile, that has actual practical applications, isnt due before next year or even later if there are delays.

First, you would need to develop the tools to develop the applications. One step at a time. AMD have been working on it.

OpenSUSE (linux distro) should be looking into HSA.

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

-snip-

There's a fork of OBS now that makes use of AMD's VCE tech. So you have a lot of options in how you want to set up your encode.

 

I have not tested but I wonder if the built in H.264 VCE avenue provides better quality or speed than utilizing OpenCL on the GPU from x264.

Link to comment
Share on other sites

Link to post
Share on other sites

There's a fork of OBS now that makes use of AMD's VCE tech. So you have a lot of options in how you want to set up your encode.

 

I have not tested but I wonder if the built in H.264 VCE avenue provides better quality or speed than utilizing OpenCL on the GPU from x264.

I didn't know that existed. Got a link? I might do some comparisons when I get some time over.
Link to comment
Share on other sites

Link to post
Share on other sites

Ehh.... What are you talking about? I think you have misunderstood something completely here. Sandy Bridge supports both encoding and decoding of H.264 in fixed-function hardware.

 

I don't think it was as simple as some compiler options being incorrect. It's the fact that things like a Core 2 Duo will have a hard time decoding high bitrate 1080p or 4K H.264 in software. It doesn't even have to be a Core 2 Duo. A lot of Atom processors and very low clocked Core processors (like Core M) might not be up for the task either. I don't have any Atom or Core M devices to test it with, but I was very active in the video community around the time H.264 was getting popular and a huge amount of users were reporting abysmal performance.

 

 

I think you missed the part where I said "physics" as well. Surely even you will agree that running physics calculations on the GPU is a good idea. In fact, Microsoft recently bought Havok (from Intel) which can use the GPU for physics.

 

 

Then why are you saying things along the lines of "it's fast enough"? You said it twice in your response. Once for compression algorithms and once for video decoding.

except that makes no sense since decoding (decompression really) is an embarrassingly parallel problem (unless your algorithm was stupidly designed where each bit manipulation is dependent on the previous). Suppose you have a few tens of millions of bits come through per second (1920x1080 with 8 bits per pixel per second (which is way higher than most available bit rates for 1`080p content) is ~16 million bits. Well, Sandy Bridge on AVX 128 alone has 4 * (128/32) * 2 * 3.4*10^9 = 108.8GFlops of performance at its disposal. That's one hundred billion manipulations. Even if you had to do ~8000-10000 manipulations per bit in raw AVX (laughable), Sandy Bridge would have the horse power to get through it just fine, and that's before the iGPU and transcoding blocks get involved. Those would only add to the compute power. If H.264 transcoding is actually suffering on SandyBridge, it's a bad algorithm or compiler option. There's no excuse.

 

I say that because throwing more power at the problem is not always (and is in fact rarely) the best solution. Of the people who compress a lot of files, most tend to have workstations with beefier CPUs than the average Joe anyway, and they'll have the professional software to do it correctly. They are power users. For the average Joe, the difference between 1.3 seconds to compress a file and 1.7 is negligible.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Link to comment
Share on other sites

Link to post
Share on other sites

except that makes no sense since decoding (decompression really) is an embarrassingly parallel problem (unless your algorithm was stupidly designed where each bit manipulation is dependent on the previous). Suppose you have a few tens of millions of bits come through per second (1920x1080 with 8 bits per pixel per second (which is way higher than most available bit rates for 1`080p content) is ~16 million bits. Well, Sandy Bridge on AVX 128 alone has 4 * (128/32) * 2 * 3.4*10^9 = 108.8GFlops of performance at its disposal. That's one hundred billion manipulations. Even if you had to do ~8000-10000 manipulations per bit in raw AVX (laughable), Sandy Bridge would have the horse power to get through it just fine, and that's before the iGPU and transcoding blocks get involved. Those would only add to the compute power. If H.264 transcoding is actually suffering on SandyBridge, it's a bad algorithm or compiler option. There's no excuse.

1) Why do you keep bringing up software decoding with Sandy Bridge? Not everyone has a Sandy Bridge CPU. I specifically said Atom and Core M in my post and yet you keep going back to "but Sandy Bridge should be able to do it".

 

2) You don't understand how video codecs work. They don't use RGB and they are not as simple as just saying "this pixel will be this color". They are far more complicated than that.

 

3) Now you're bringing up transcoding as well, and let me tell you, transcoding is extremely demanding and it is NOT a problem of just "bad algorihm or compiler option". It takes me about an hour to transcode a 1 minute clip to HEVC (placebo preset). Are you going to tell me that it should take a minute or less but it's just a problem of the compiler setting whoever compiled the program with?

 

 

 

I say that because throwing more power at the problem is not always (and is in fact rarely) the best solution. Of the people who compress a lot of files, most tend to have workstations with beefier CPUs than the average Joe anyway, and they'll have the professional software to do it correctly. They are power users. For the average Joe, the difference between 1.3 seconds to compress a file and 1.7 is negligible.

That makes no sense. First you say that throwing more power at the problem is not always the best solution, then you say that compression performance doesn't matter because people who need higher performance can just get beefier CPUs.

"We don't need higher performance" is a terrible argument that you should never use so please stop it.

Link to comment
Share on other sites

Link to post
Share on other sites

1) Why do you keep bringing up software decoding with Sandy Bridge? Not everyone has a Sandy Bridge CPU. I specifically said Atom and Core M in my post and yet you keep going back to "but Sandy Bridge should be able to do it".

2) You don't understand how video codecs work. They don't use RGB and they are not as simple as just saying "this pixel will be this color". They are far more complicated than that.

3) Now you're bringing up transcoding as well, and let me tell you, transcoding is extremely demanding and it is NOT a problem of just "bad algorihm or compiler option". It takes me about an hour to transcode a 1 minute clip to HEVC (placebo preset). Are you going to tell me that it should take a minute or less but it's just a problem of the compiler setting whoever compiled the program with?

That makes no sense. First you say that throwing more power at the problem is not always the best solution, then you say that compression performance doesn't matter because people who need higher performance can just get beefier CPUs.

"We don't need higher performance" is a terrible argument that you should never use so please stop it.

I bring it up b/c it's a 4-year-old architecture. Even Atom processors these days have about 1/3 that power, which is still 200+ manipulations per bit in capability. Mobile processors have recently been undergoing the sort of performance leaps we were seeing in the Pentium 2/3 days. If you use an Atom built today for 1080p streaming, you should be fine. 4 * 128/32 * 2 *1.5*10^9 = 48GFlops for a midrange Baytrail SOC.

I know exactly how they work. They're real time delta color decompression. It's embarrassingly parallel unless someone decided on a lousy algorithm for the compression. Transcoding is an umbrella term under whic fall encoding and decoding.

I'm telling you it's a bad algorithm, because 1 minute of raw 4K footage is 3*3840*2160*60 =1.492 billion bytes with no initial compression at all(which would be obscene), yet it's still well within budget of a midrange Atom to compress in a single second. Delta Color compression both intra and inter-frame is possible with parallel coordinates which don't require you decompress one block of a frame before another. The next frame's block coordinates are relative to the first, but you can check all the transforms in parallel. We covered this in my algorithms class (which I aced). If codecs are having problems on chips with more than 10x the manipulations capacity relative to the data input size, it's a bad algorithm or a bad compiler setting where vectorized optimization a aren't being used, because a video is nothing more than a series of color transformations in pixel coordinates. It is the very definition of embarrassingly parallel per frame, and the move between frames can have the relative coordinate transforms done in embarrassingly parallel fashion as well.

No, now you're twisting my words. I'm saying in terms of dedicated transcoding hardware, each CPU generation does get better, but the reality is CPUs overall have the performance capability to do video transcoding in raw software in real time. Even a 1-core Celeron at 2GHz can do it in the Nehalem generation and later, at least for 1080p.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

...

 

I think the problem is patrick, that your information goes over most people's heads, including mine. I'm willing to accept my ignorance on the subject, others might feel the need to compensate. 

Dont think this discussion will ever be resolved, so it might be prudent to debate whether selling 4M/8T as '8 cores' is justified? (not demanding, just asking).

Link to comment
Share on other sites

Link to post
Share on other sites

I think the problem is patrick, that your information goes over most people's heads, including mine. I'm willing to accept my ignorance on the subject, others might feel the need to compensate. 

Dont think this discussion will ever be resolved, so it might be prudent to debate whether selling 4M/8T as '8 cores' is justified? (not demanding, just asking).

I honestly think there's a chance AMD will lose this one. It's not about the shared FPU as much as it is about the shared prefetcher. The two "cores" don't have the ability to act independently of each other in independent tasks, and each CPU core is itself by definition a CPU. If the CPU can't act on its own, it's not a CPU. Additionally that argument could extend to the FPU with each core not being able to dispatch FP instructions at the same time, but I think the biggest angle this lawsuit could come up with would be execution independence being hindered by the shared prefetch unit (which was abandoned in either Piledriver/Vishera or Steamroller, can't remember which). I honestly think AMD should have gone 4M/8T both to be technically correct and to avoid the horrible PR they've gotten since.

 

Think of a frame of a video as nothing more than a 2D array (really it's a 1D array with an access function, but that only complicates matters). You can compress the frame by, instead of storing all 3 orf 4 bytes per pixel, starting with the 3 bytes of the first pixel and then doing using Huffman Coding to tell you how much each successive pixel changes from the previous one. This can be done in parallel using AVX instructions to get the differences for each color for each pair of pixels.

 

If you wanted a very simple 1-pass compression, you'd then run this information through Huffman's Algorithm and send both the prefix tree (unique identifiers for color deltas, where the most prominent are encoded with the fewest bits and vice versa to get the maximal compression for a prefix scheme), the starting pixel data, and the encoded prefixes in order. Decompression in this case is going to be relatively slow since we made every single pixel dependent on the previous one, meaning the decoding is forcibly serialized. There are 2 ways to mitigate this: block-wise compression (which is either lossy or a less than computationally ideal recursive decompression solution), and inter-frame compression. Inter-frame sends the pixel color changes between each frame, which can be lossy or lossless depending on how many bits you want to allow and how dynamically the color of the frames is changing.

 

The best codecs today use both techniques together in schemes which should for all intents and purposes be embarrassingly parallel (the algorithms of compression and decompression are much more intricate in this case; but they're based on the same principles as before, though better schemes than Huffman Coding have been found). Embarrassingly parallel means, discounting the overhead of launching threads, throwing more cores at the problem has 100% scaling. The more parallel you are, the better. The idea that anything as new as Nehalem is struggling is absolutely baffling. My cell phone has more computing power than the entire planet did in the mid 1980s, and back then compression and decompression were even more important since networking was extraordinarily more expensive. Programs were designed to know as much as possible and send/receive as little as possible while saying as much as possible in that tiny data stream.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

piledriver-3b.jpgit does have 8 cores just the cores are in sets sharing resources and if its quad core we would see much better gaming performance as it would be using 50% of the cpu instead of 25% 

Link to comment
Share on other sites

Link to post
Share on other sites

piledriver-3b.jpgit does have 8 cores just the cores are in sets sharing resources and if its quad core we would see much better gaming performance as it would be using 50% of the cpu instead of 25% 

ALU aren't cores. They are part of them.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

ALU aren't cores. They are part of them.

if its a quad core we would be seeing better gaming performance than we are currently getting with fx processors

Link to comment
Share on other sites

Link to post
Share on other sites

I bring it up b/c it's a 4-year-old architecture. Even Atom processors these days have about 1/3 that power, which is still 200+ manipulations per bit in capability. Mobile processors have recently been undergoing the sort of performance leaps we were seeing in the Pentium 2/3 days. If you use an Atom built today for 1080p streaming, you should be fine. 4 * 128/32 * 2 *1.5*10^9 = 48GFlops for a midrange Baytrail SOC.

If you use a modern one sure. But again, The Scene did not want to change to H.264 until recently (2012) because of hardware issues. H.264 is over a decade old now (hell even x264 is over a decade old) and did not start taking off until several generations after we got hardware accelerated H.264 decoding because of the performance issues. Nvidia added proper H.264 decoding in hardware with the release of Tesla (back in late 2006).

This is an issue that's solved today but wasn't a few years ago, and we are starting to see the same issues again with HEVC.

 

 

I know exactly how they work. They're real time delta color decompression. It's embarrassingly parallel unless someone decided on a lousy algorithm for the compression.

If you know how they work then why did you start talking nonsense like "1920x1080 with 8 bits per pixel per second"? That's not how video works. It doesn't work with "bits per pixel" when the video is saved because that would be extremely wasteful.

 

 

Transcoding is an umbrella term under whic fall encoding and decoding.

Haha is this going to be a "alpha channel (aka gamma channel)" rerun? Transcoding is not an umbrella term. Transcoding specifically means converting from one format to another (or the same). It involves first decoding data and then encode it to something else. Transcoding is not an umbrella term for either encoding or decoding.

Maybe @.spider.. wants to see this as well. We got some good laughs last time.

 

 

 

I'm telling you it's a bad algorithm, because 1 minute of raw 4K footage is 3*3840*2160*60 =1.492 billion bytes with no initial compression at all(which would be obscene), yet it's still well within budget of a midrange Atom to compress in a single second. Delta Color compression both intra and inter-frame is possible with parallel coordinates which don't require you decompress one block of a frame before another. The next frame's block coordinates are relative to the first, but you can check all the transforms in parallel. We covered this in my algorithms class (which I aced). If codecs are having problems on chips with more than 10x the manipulations capacity relative to the data input size, it's a bad algorithm or a bad compiler setting where vectorized optimization a aren't being used, because a video is nothing more than a series of color transformations in pixel coordinates. It is the very definition of embarrassingly parallel per frame, and the move between frames can have the relative coordinate transforms done in embarrassingly parallel fashion as well.

Good thing you weren't taking a codec class because you would have failed it. Maybe you should tell Linus that his new server is a total waste because even an Atom can encode his videos in real time (in before you say it can, then uses the UltraFast preset to prove something).

You are so extremely out of touch with the things you are talking about. Have you ever even encoded something? It is a lot more demanding that you seem to think.

 

But since you're such an amazing person I bet you could write a better encoder than x264. After all, using that it can take several hours for my 4.4GHz 2500K to encode. According to you it should be able to do it in real time. Here is the source code. If you make it so that I can encode in real time using the placebo preset then I will genuinely call you a genius. Until then I will call you a delusional idiot who talks about things he does not understand.

Right now I am getting about 3 FPS when I am not using my CPU for anything else. That's for H.264 and I get even less with HEVC.

post-216-0-62324400-1446995372.png

 

 

No, now you're twisting my words. I'm saying in terms of dedicated transcoding hardware, each CPU generation does get better, but the reality is CPUs overall have the performance capability to do video transcoding in raw software in real time. Even a 1-core Celeron at 2GHz can do it in the Nehalem generation and later, at least for 1080p.

Not transcoding, but possibly decoding. I don't know what the average CPU is right now, but a few years ago we did have major issues with people not being able to decode it in software. Hardware acceleration helped tremendously for that, and it will help now that we are moving over to HEVC and VP9 as well.

I have my doubts about the single core 2GHz Celeron, especially if the video is 10bit, but I can't disprove it since I don't have a CPU like that. What I can say however is that my dual core, A15 based 1.7GHz Nexus 10 can just barely keep up with 10bit H.264 at 1080p. The audio sometimes cuts out or gets delayed but other than that it can manage ~24 FPS. The CPU is at 100% load the entire time though and I can forget about heavily stylized subtitles.

 

 

Oh and if I recall correctly all of this is because I said browsers benefited from GPGPU right? You ignored the game with fairly decent graphics I linked which runs in the browser. That's only possible because of the hardware acceleration. On top of that we benefit from it in other areas too. Even if you think the CPU is fast enough for some of the things that are now offloaded to the GPU in browsers, why say no to free performance?

Link to comment
Share on other sites

Link to post
Share on other sites

If you use a modern one sure. But again, The Scene did not want to change to H.264 until recently (2012) because of hardware issues. H.264 is over a decade old now (hell even x264 is over a decade old) and did not start taking off until several generations after we got hardware accelerated H.264 decoding because of the performance issues. Nvidia added proper H.264 decoding in hardware with the release of Tesla (back in late 2006).

This is an issue that's solved today but wasn't a few years ago, and we are starting to see the same issues again with HEVC.

If you know how they work then why did you start talking nonsense like "1920x1080 with 8 bits per pixel per second"? That's not how video works. It doesn't work with "bits per pixel" when the video is saved because that would be extremely wasteful.

Haha is this going to be a "alpha channel (aka gamma channel)" rerun? Transcoding is not an umbrella term. Transcoding specifically means converting from one format to another (or the same). It involves first decoding data and then encode it to something else. Transcoding is not an umbrella term for either encoding or decoding.

Maybe @.spider.. wants to see this as well. We got some good laughs last time.

Good thing you weren't taking a codec class because you would have failed it. Maybe you should tell Linus that his new server is a total waste because even an Atom can encode his videos in real time (in before you say it can, then uses the UltraFast preset to prove something).

You are so extremely out of touch with the things you are talking about. Have you ever even encoded something? It is a lot more demanding that you seem to think.

But since you're such an amazing person I bet you could write a better encoder than x264. After all, using that it can take several hours for my 4.4GHz 2500K to encode. According to you it should be able to do it in real time. Here is the source code. If you make it so that I can encode in real time using the placebo preset then I will genuinely call you a genius. Until then I will call you a delusional idiot who talks about things he does not understand.

Right now I am getting about 3 FPS when I am not using my CPU for anything else. That's for H.264 and I get even less with HEVC.

Encoding.PNG

Not transcoding, but possibly decoding. I don't know what the average CPU is right now, but a few years ago we did have major issues with people not being able to decode it in software. Hardware acceleration helped tremendously for that, and it will help now that we are moving over to HEVC and VP9 as well.

I have my doubts about the single core 2GHz Celeron, especially if the video is 10bit, but I can't disprove it since I don't have a CPU like that. What I can say however is that my dual core, A15 based 1.7GHz Nexus 10 can just barely keep up with 10bit H.264 at 1080p. The audio sometimes cuts out or gets delayed but other than that it can manage ~24 FPS. The CPU is at 100% load the entire time though and I can forget about heavily stylized subtitles.

Oh and if I recall correctly all of this is because I said browsers benefited from GPGPU right? You ignored the game with fairly decent graphics I linked which runs in the browser. That's only possible because of the hardware acceleration. On top of that we benefit from it in other areas too. Even if you think the CPU is fast enough for some of the things that are now offloaded to the GPU in browsers, why say no to free performance?

It's not remotely wasteful. That's the highest bit rate for 1080p videos under H.264. I'm giving the worst case scenario (which can be produced with uniform distribution generation of pixel colors and throwing them into sequential frames for a video). Everything will be better than that, because everything has some level of compressibility that's block-wise in normal footage situations. I also invite you to provide any sources to the contrary.

No, transcoding requires encode and decode capabilities. The various codecs have different schemes which can't be directly translated without undoing some of the encoding, decompressing the frames somewhat if not completely. Transcoding is the umbrella term of the functions which encode and decode. It can be used to translate between codecs, but that is not its root definition (kleinberg & Tardos).

I have encoded and decoded. We had to implement and prove the runtime characteristics of a custom algorithm as a 15% project. I implemented a video compression algorithm with a compression ratio of 12:1, retention ratio of 98%, running in about 3n *(log n)^2 time (implemented under C++ using CilkPlus vectorization libraries for the similarity analysis). On my stock 2600k at school, a 40 minute video could be compressed under that scheme in 12 minutes in software and fully decompressed in 8 with a fairly low loss ratio. Now, it's not quite as compact as H.264, but a 10% memory delta isn't bad for a total amateur armed with nothing more than linear algebra and modern programming standard know-how. I'm saying we have the hardware to do it. You can prove it to yourself with a little mathematical induction. Software is a decade behind the instructions we have available today, across the board.

I'm not saying no to free performance. I'm saying it doesn't exist at all, and the trade off is stupid when we're not even at the limits of what we have now. The same is true of games being so far behind in multithreaded design. Even now the threads aren't balanced, and a little analysis under a profiler reveals much of the CPU usage is spin locks of the form:

while(mutex_locked) {}

mutex_locked = true;

//Handle the race condition code

Mutex_locked = false;

And if you don't believe me, write a C++ program that just runs an infinite loop checking a variable. The usage for that core skyrockets to almost 100%.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

It's not remotely wasteful. That's the highest bit rate for 1080p videos under H.264. I'm giving the worst case scenario (which can be produced with uniform distribution generation of pixel colors and throwing them into sequential frames for a video). Everything will be better than that, because everything has some level of compressibility that's block-wise in normal footage situations. I also invite you to provide any sources to the contrary.

No, transcoding requires encode and decode capabilities. The various codecs have different schemes which can't be directly translated without undoing some of the encoding, decompressing the frames somewhat if not completely. Transcoding is the umbrella term of the functions which encode and decode. It can be used to translate between codecs, but that is not its root definition (kleinberg & Tardos).

I have encoded and decoded. We had to implement and prove the runtime characteristics of a custom algorithm as a 15% project. I implemented a video compression algorithm with a compression ratio of 12:1, retention ratio of 98%, running in about 3n *(log n)^2 time (implemented under C++ using CilkPlus vectorization libraries for the similarity analysis). On my stock 2600k at school, a 40 minute video could be compressed under that scheme in 12 minutes in software and fully decompressed in 8 with a fairly low loss ratio. Now, it's not quite as compact as H.264, but a 10% memory delta isn't bad for a total amateur armed with nothing more than linear algebra and modern programming standard know-how. I'm saying we have the hardware to do it. You can prove it to yourself with a little mathematical induction. Software is a decade behind the instructions we have available today, across the board.

I'm not saying no to free performance. I'm saying it doesn't exist at all, and the trade off is stupid when we're not even at the limits of what we have now. The same is true of games being so far behind in multithreaded design. Even now the threads aren't balanced, and a little analysis under a profiler reveals much of the CPU usage is spin locks of the form:

while(mutex_locked) {}

mutex_locked = true;

//Handle the race condition code

Mutex_locked = false;

And if you don't believe me, write a C++ program that just runs an infinite loop checking a variable. The usage for that core skyrockets to almost 100%.

i think you forget that 10bit sound creates a HUGE issue.

 

i got an old (1st gen) ATOM processor in my old HTPC... sure it could play 1080p 8-bit MKV with H.264.... if you even consider 10bit, you gonna BSOD from CPU hanging itself.

Link to comment
Share on other sites

Link to post
Share on other sites

i think you forget that 10bit sound creates a HUGE issue.

i got an old (1st gen) ATOM processor in my old HTPC... sure it could play 1080p 8-bit MKV with H.264.... if you even consider 10bit, you gonna BSOD from CPU hanging itself.

Sound is also a delta-compressible data type, but I'll admit I've never analyzed it and don't know the techniques used today. That said, BSOD? No. Just buffer enough leeway to get started. Nothing should ever cause a BSOD in compression and decompression unless you manage to run out of memory and cause a stack segmentation fault.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Sound is also a delta-compressible data type, but I'll admit I've never analyzed it and don't know the techniques used today. That said, BSOD? No. Just buffer enough leeway to get started. Nothing should ever cause a BSOD in compression and decompression unless you manage to run out of memory and cause a stack segmentation fault.

no trust me, it BSODs... it locks up for 2-5 minutes then BSOD with a kernel error. Usually it reboots too fast for me to read the error.

 

My current PC and my old setup with the FX 8320, even my friends 3 year old laptop (mobile i7, 2 cores + HT.....)

 

 

Sound is compressible, but it is more sensitive then images. Too much or too little will distort the audio when playing back. So if the file itself is compressed to much, the decompression during playback can distort it.

 

I used to help out with fansubbing Anime before... i got a long lecture of how i fucked up a file because i selected a too high compression ratio in the software we used

Link to comment
Share on other sites

Link to post
Share on other sites

no trust me, it BSODs... it locks up for 2-5 minutes then BSOD with a kernel error. Usually it reboots too fast for me to read the error.

My current PC and my old setup with the FX 8320, even my friends 3 year old laptop (mobile i7, 2 cores + HT.....)

Sound is compressible, but it is more sensitive then images. Too much or too little will distort the audio when playing back. So if the file itself is compressed to much, the decompression during playback can distort it.

I used to help out with fansubbing Anime before... i got a long lecture of how i fucked up a file because i selected a too high compression ratio in the software we used

There's no reason to BSOD. It's an error in the implementation of a piece of faulty hardware. Even with 10-bit data, C++ provides built-in support of X-bit size emulations if they aren't native to your computer. And if you implement your comp/decomp in any other language, you're either a moron or know something very specific about your use case (iOS runs Swift which is nothing but a C wrapper).

It is quite sensitive, but the human ears are also far more sensitive than the human eyes.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

It's not remotely wasteful. That's the highest bit rate for 1080p videos under H.264. I'm giving the worst case scenario (which can be produced with uniform distribution generation of pixel colors and throwing them into sequential frames for a video). Everything will be better than that, because everything has some level of compressibility that's block-wise in normal footage situations. I also invite you to provide any sources to the contrary.

Actually is it extremely wasteful and that's why modern video codecs don't do it. There is absolutely no point in saving data on a pixel by pixel basis. Formats like H.264 instead uses things like slices and motion estimates and vectors to change the image. Information is only saved on a pixel by pixel basis (and even then it's not a 1:1 mapping) in key-frames (also called I-frames). All other frames contain information that is traced back to the key frame. What you are suggesting, saving 8 bits per pixel in every frame, is what RAW video is (except with more bits).

 

Here is a (tiny) explanation of how (a few parts) of H.264 works. I recommend you read it. Modern codecs do not work by defining what color a pixel is.

 

 

There is no such thing as "highest bit rate for 1080p videos under H.264" because it does not map the color of each pixel. It uses completely different methods for saving the video.

I just created a video with a bit rate of 41903 Kbps without breaking any of the H.264 specifications. According to you the maximum bitrate possible for a 1920x1080 file using H.264 would be 16589Kbps. My file is more than twice of that.

(How I got the file so big: Set the quality to RF 0 and then the preset to Ultrafast)

Hell, even my phone records at a higher bit rate than your "maximum allowed".

1920x1080x8 = (16,588,800 bits) 16.5 megabits.

My phone = 17 megabits per second.

 

 

 

 

 

No, transcoding requires encode and decode capabilities. The various codecs have different schemes which can't be directly translated without undoing some of the encoding, decompressing the frames somewhat if not completely. Transcoding is the umbrella term of the functions which encode and decode. It can be used to translate between codecs, but that is not its root definition (kleinberg & Tardos).

This is hilarious.

Yes, transcoding requires encoding and decoding capabilities because transcoding is literally when you take one format (such as H.264), decodes it and then encodes it into something else (which might also be H.264).

 

What you are referring to when talking about "directly translate without undoing some of the encoding" is called remuxing. That's unrelated to encoding and decoding. It just changes the container without touching the video and audio data. Remuxing and transcoding are two completely different things.

 

Transcoding is NOT an umbrella term.

Transcoding is "The process of converting a media file or object from one format to another. Transcoding is often used to convert video formats (i.e., Beta to VHS, VHS to QuickTime, QuickTime to MPEG)".

If that's not a good enough source for you then here are a few more:

Wikipedia:

"Transcoding is the direct analog-to-analog or digital-to-digital conversion of one encoding to another,[1] such as for movie data files (e.g., PAL, SECAM, NTSC), audio files (e.g., MP3, WAV), or character encoding (e.g., UTF-8, ISO/IEC 8859).

 

HydrogenAudio: "Transcoding means converting a file from one encoding method (i.e. file format) to another. Transcoding can be performed from lossless to lossless, from lossless to lossy, from lossy to lossy, and from lossy to lossless."

 

SearchSOA: "There are a number of different ways that transcoding can take place but the overall process remains the same. The source format is translated into a raw intermediate format and then re-translated into a format the end user's device recognizes."

 

JWPlayer: "Transcoding is the process of taking digital media, extracting the tracks from the container, decoding those tracks, filtering (e.g. remove noise, scale dimensions, sharpen, etc), encoding the tracks, and multiplexing the new tracks into a new container."

 

MainstreamData: "Transcoding is the process of converting one digital encoding to another. This is something needed when a particular target device does not support the format (picture trying to play a CD on a record player) or does not contain enough storage capacity to support the file size (imagine trying to watch an IMAX 3D movie on an iPhone)."

 

TechoPedia: "Transcoding is the process of converting a file from one encoding format to another. This allows the conversion of incompatible data to a better-supported, more modern form of data. Transcoding is often performed if the target device does not support the format or has only limited storage capability."

 

Kaltura: "A transcode is made from taking an encoded piece of video and then converting it into one or more newly and more compressed streams that can then be played in a player on a computer or mobile device depending on the settings and methods used."

 

You will never hear anyone say "transcode" when they only mean decode. That's because transcode is define as taking an encoded data stream and converting it into another data stream, and that requires first decoding and then encoding. It is NOT an umbrella term. You won't find a source that says that transcoding can refer to either encoding or decoding.

 

 

 

 

Transcoding is the umbrella term of the functions which encode and decode.

Ahaha no. You are getting "trancoding" confused with "codec".

A codec is defined as "A codec is a device or computer program capable of encoding or decoding a digital data stream or signal.".

 

So when you have been saying "transcode" you have actually meant to say "codec".

A codec is a program that can encode or decode for example video data like H.264. Codec is called codec because it is short for "coder-decoder".

Transcode means to transform code.

Muxing which I mentioned before comes from "multiplexing" which means to take several signals and put them into 1 signal. In the audio and video world that means take several data streams (such as video and audio) and put them into a single file. The container then keeps track of which data is audio and which is video. This is explained in this video at 27:54.

 

I recommend you watch this Techquickie video. It will explain a lot of the basics which you seem to have gotten wrong.

 

 

 

I have encoded and decoded. We had to implement and prove the runtime characteristics of a custom algorithm as a 15% project. I implemented a video compression algorithm with a compression ratio of 12:1, retention ratio of 98%, running in about 3n *(log n)^2 time (implemented under C++ using CilkPlus vectorization libraries for the similarity analysis). On my stock 2600k at school, a 40 minute video could be compressed under that scheme in 12 minutes in software and fully decompressed in 8 with a fairly low loss ratio. Now, it's not quite as compact as H.264, but a 10% memory delta isn't bad for a total amateur armed with nothing more than linear algebra and modern programming standard know-how. I'm saying we have the hardware to do it. You can prove it to yourself with a little mathematical induction. Software is a decade behind the instructions we have available today, across the board.

Sounds interesting. Got any link to it? I would like to compare encoding/decoding speed and compression ratio for myself if that's okay.

What settings did you use to create the H.264 file?

 

 

 

 

I'm not saying no to free performance. I'm saying it doesn't exist at all, and the trade off is stupid when we're not even at the limits of what we have now. The same is true of games being so far behind in multithreaded design. Even now the threads aren't balanced, and a little analysis under a profiler reveals much of the CPU usage is spin locks of the form:

while(mutex_locked) {}

mutex_locked = true;

//Handle the race condition code

Mutex_locked = false;

And if you don't believe me, write a C++ program that just runs an infinite loop checking a variable. The usage for that core skyrockets to almost 100%.

Wait, so you are saying that compressing/decompressing things with WinZip is just as fast with or without hardware acceleration enabled? That's strange because the benchmarks showed that WinZip becomes a lot faster when you enable it. Earlier in the thread you said that the extra speed didn't matter because people who need to compress and decompress things fast will have a beefy CPU to do it with anyway, and for the average Joe the few seconds didn't matter. That to me sounded like "no to free performance".

 

What trade off are you referring to?

Link to comment
Share on other sites

Link to post
Share on other sites

i think you forget that 10bit sound creates a HUGE issue.

 

i got an old (1st gen) ATOM processor in my old HTPC... sure it could play 1080p 8-bit MKV with H.264.... if you even consider 10bit, you gonna BSOD from CPU hanging itself.

Oh you got an old HTPC? Could you try decoding a 1080p (8 bit) video on the CPU and post results? I know a 10bit one won't work but maybe an 8bit one will. Please make sure that it is not being decoded on the GPU though if you decide to test it.

 

I have to agree with Patrick on this one though. If you get a BSOD then it sounds like an issue with hardware (or possibly the programs you use). At worst it should just be extremely laggy.

 

 

 

 

Sound is also a delta-compressible data type, but I'll admit I've never analyzed it and don't know the techniques used today. That said, BSOD? No. Just buffer enough leeway to get started. Nothing should ever cause a BSOD in compression and decompression unless you manage to run out of memory and cause a stack segmentation fault.

It's just H.264 but with 10 bits of internal color precision instead of the 8 bits regular H.264 files got. It allows for smaller files at the same quality but it breaks hardware accelerated playback since most GPUs don't support 10bit H.264 and instead it has to be decoded on the CPU.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×