Jump to content

Why are LTT videos so inconsistent with the captioning?

aeiro

Hi All,

 

As a deaf person I _really_ appreciate the availability of captions on most of LTT's videos over the last couple months. I'm not sure what triggered this change from earlier videos, but I am grateful for it. The strange thing is, though, that while most videos do have captions, the time it takes to upload them varies; sometimes it's instant, sometimes it takes a few hours, sometimes it takes a few days, and sometimes it just doesn't happen at all. I'm typically happy to financially support creators and content that is fully accessible to me and other Deaf/HoH people, and it's really bugging me that LTT is so close, but not quite there yet. I'm curious if there's any way to gain insight into the captioning process and whether there's a better way to give feedback (I do have some minor feedback on inaccuracies in the subtitles but they've been...better than many other channels). 

 

Side note: I'd also really love if the WAN show started doing transcripts as well. Some podcasts do have transcripts, but they seem like they're few and far between. I really enjoy podcasts (and especially video ones) but it's difficult to consume them without some accessibility features (YouTube's auto craptions are...pretty bad if you have no hearing). 

Link to comment
Share on other sites

Link to post
Share on other sites

I think I remember @JonoT(handles stuff like this iirc) saying that they're done by a 3rd party company or team. As for transcripts, given wan is not something that is hugely popular (at least not like their main videos) I doubt transcripts will happen. Or at least they wouldn't be non automated. It doesn't make much sense

Either @piratemonkey or quote me when responding to me. I won't see otherwise

Put a reaction on my post if I helped

My privacy guide | Why my name is piratemonkey PSU Tier List Motherboard VRM Tier List

What I say is from experience and the internet, and may not be 100% correct

Link to comment
Share on other sites

Link to post
Share on other sites

I'm not really sure about the answer to OP's questions.

 

What I will say is that someone REALLY needs to proof-read the captions. There are some cases where what has been typed is entirely inconsistent with what is being said, and doesn't even make any sense. This is mostly a problem with tech terminology (I've seen "PCIe" mistranscribed a few times, for instance.)

 

Given how much time is put into video production at LMG otherwise, it comes across as  a bit lazy to me that adding and proofreading the captions isn't just part of the regular video production workflow. Most of the videos are scripted anyway - it's just a matter of uploading the script to YouTube, tweaking the timings, and correcting any mistakes. Not complicated for an organisation as huge as LMG

 

I don't have hearing difficulties myself, but I do still appreciate the captions for times when I'd prefer not to have the sound on.

____________________________________________________________________________________________________________________________________

 

 

____________________________________________________________________________________________________________________________________

pythonmegapixel

into tech, public transport and architecture // amateur programmer // youtuber // beginner photographer

Thanks for reading all this by the way!

By the way, my desktop is a docked laptop. Get over it, No seriously, I have an exterrnal monitor, keyboard, mouse, headset, ethernet and cooling fans all connected. Using it feels no different to a desktop, it works for several hours if the power goes out, and disconnecting just a few cables gives me something I can take on the go. There's enough power for all games I play and it even copes with basic (and some not-so-basic) video editing. Give it a go - you might just love it.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, pythonmegapixel said:

What I will say is that someone REALLY needs to proof-read the captions. There are some cases where what has been typed is entirely inconsistent with what is being said, and doesn't even make any sense. This is mostly a problem with tech terminology (I've seen "PCIe" mistranscribed a few times, for instance.)

Agreed - even things that should be obvious from context like 1080p often get mistranscribed to ten ADP (took me a few tries to understand myself). I'm sure that the actual captioning process is farmed out, but I suspect they're not getting any quality guarantees from the people they've contracted with, or if they are, nobody's enforcing them. Best for me guess is that the captions are 90-95% accurate, which is not terrible (and better than autocaptions for sure), but that still means a mistranscribed word every sentence or so. Industry standard is 99% accuracy, and I've seen 99.99% guarantees before (basically one mistranscription in the entire script). I do wish that they did the captioning as a prerequisite before uploading, it's clear that they strive for high quality video and audio, it's just frustrating that the same standards aren't applied to accessibility features as well. 

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think their videos are word for word read from the teleprompter. So it's not as simple as copying lines from a document.

Intel® Core™ i7-12700 | GIGABYTE B660 AORUS MASTER DDR4 | Gigabyte Radeon™ RX 6650 XT Gaming OC | 32GB Corsair Vengeance® RGB Pro SL DDR4 | Samsung 990 Pro 1TB | WD Green 1.5TB | Windows 11 Pro | NZXT H510 Flow White
Sony MDR-V250 | GNT-500 | Logitech G610 Orion Brown | Logitech G402 | Samsung C27JG5 | ASUS ProArt PA238QR
iPhone 12 Mini (iOS 17.2.1) | iPhone XR (iOS 17.2.1) | iPad Mini (iOS 9.3.5) | KZ AZ09 Pro x KZ ZSN Pro X | Sennheiser HD450bt
Intel® Core™ i7-1265U | Kioxia KBG50ZNV512G | 16GB DDR4 | Windows 11 Enterprise | HP EliteBook 650 G9
Intel® Core™ i5-8520U | WD Blue M.2 250GB | 1TB Seagate FireCuda | 16GB DDR4 | Windows 11 Home | ASUS Vivobook 15 
Intel® Core™ i7-3520M | GT 630M | 16 GB Corsair Vengeance® DDR3 |
Samsung 850 EVO 250GB | macOS Catalina | Lenovo IdeaPad P580

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, BlueChinchillaEatingDorito said:

I don't think their videos are word for word read from the teleprompter. So it's not as simple as copying lines from a document.

Sometimes youtube channels have the CC from the teleprompter, but the video is changes slightly, and it makes no sense.

I could use some help with this!

please, pm me if you would like to contribute to my gpu bios database (includes overclocking bios, stock bios, and upgrades to gpus via modding)

Bios database

My beautiful, but not that powerful, main PC:

prior build:

Spoiler

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, HelpfulTechWizard said:

Sometimes youtube channels have the CC from the teleprompter, but the video is changes slightly, and it makes no sense.

Exactly. Especially for a lot of LTT content where it's basically not done sitting in front of a camera and you have multiple hosts, it's going to be veer away from the script a lot. And for Techlinked when it's basically just memeing about news (with the butting in from Riley or James in the background), a lot of what's said isn't going to be in the script. 

Intel® Core™ i7-12700 | GIGABYTE B660 AORUS MASTER DDR4 | Gigabyte Radeon™ RX 6650 XT Gaming OC | 32GB Corsair Vengeance® RGB Pro SL DDR4 | Samsung 990 Pro 1TB | WD Green 1.5TB | Windows 11 Pro | NZXT H510 Flow White
Sony MDR-V250 | GNT-500 | Logitech G610 Orion Brown | Logitech G402 | Samsung C27JG5 | ASUS ProArt PA238QR
iPhone 12 Mini (iOS 17.2.1) | iPhone XR (iOS 17.2.1) | iPad Mini (iOS 9.3.5) | KZ AZ09 Pro x KZ ZSN Pro X | Sennheiser HD450bt
Intel® Core™ i7-1265U | Kioxia KBG50ZNV512G | 16GB DDR4 | Windows 11 Enterprise | HP EliteBook 650 G9
Intel® Core™ i5-8520U | WD Blue M.2 250GB | 1TB Seagate FireCuda | 16GB DDR4 | Windows 11 Home | ASUS Vivobook 15 
Intel® Core™ i7-3520M | GT 630M | 16 GB Corsair Vengeance® DDR3 |
Samsung 850 EVO 250GB | macOS Catalina | Lenovo IdeaPad P580

Link to comment
Share on other sites

Link to post
Share on other sites

I didn't think anyone is actively adding CC,  they are just added as a "best guess" from youtube.

 

Slayerking92

<Type something witty here>
<Link to some pcpartpicker fantasy build and claim as my own>

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, Slayerking92 said:

I didn't think anyone is actively adding CC,  they are just added as a "best guess" from youtube.

 

If it's that kind, then they're always flagged as auto-generated.

 

Someone is definitely manually inputting them - just doing a crap job of it. Which frankly isn't acceptable for a channel as big as LTT.

 

If you're big enough to afford to spend thousands of $ to get a machine purely for review purposes, then you can afford to pay someone who actually knows about the subject matter to spend an hour or two ensuring that the captions are correct. No exceptions. This isn't some obscure "neat little feature" - it's an essential accessibility device.

____________________________________________________________________________________________________________________________________

 

 

____________________________________________________________________________________________________________________________________

pythonmegapixel

into tech, public transport and architecture // amateur programmer // youtuber // beginner photographer

Thanks for reading all this by the way!

By the way, my desktop is a docked laptop. Get over it, No seriously, I have an exterrnal monitor, keyboard, mouse, headset, ethernet and cooling fans all connected. Using it feels no different to a desktop, it works for several hours if the power goes out, and disconnecting just a few cables gives me something I can take on the go. There's enough power for all games I play and it even copes with basic (and some not-so-basic) video editing. Give it a go - you might just love it.

Link to comment
Share on other sites

Link to post
Share on other sites

It looks like the most recent video has been captioned while the video before it has not yet been captioned on the main LinusTechTips channel. Any updates from the LTT team about this?

Link to comment
Share on other sites

Link to post
Share on other sites

I'm pretty sure this isn't LTT, it's youtube... 

 

 

They recently banned captions made by humans and now only use their robot algorithm. 

 

Completely insane. I'd choose either another form of entertainment, or take it up with youtube, but you'll probably just get to talk with another robot. 

 

 

Greetings from 1982~

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Mark Kaine said:

Yes, you have entirely the wrong end of the stick there.

 

YouTube have turned off captions made by members of the community other than the video creator. It is still entirely possible for the creator to input captions themselves.

 

And I don't think it's acceptable that they aren't doing this in a timely manner

____________________________________________________________________________________________________________________________________

 

 

____________________________________________________________________________________________________________________________________

pythonmegapixel

into tech, public transport and architecture // amateur programmer // youtuber // beginner photographer

Thanks for reading all this by the way!

By the way, my desktop is a docked laptop. Get over it, No seriously, I have an exterrnal monitor, keyboard, mouse, headset, ethernet and cooling fans all connected. Using it feels no different to a desktop, it works for several hours if the power goes out, and disconnecting just a few cables gives me something I can take on the go. There's enough power for all games I play and it even copes with basic (and some not-so-basic) video editing. Give it a go - you might just love it.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...

I've noticed a couple of improvements on recent videos, but still the length of time is extremely variable, which is frustrating. I'd also really like to push for WAN show to have captioning - at least the recordings after the fact (though it'd certainly be awesome if the live shows had captioning too).

Link to comment
Share on other sites

Link to post
Share on other sites

I also wanted to point out that TechLinked episodes rarely get captions quickly. However, due to the nature of TechLinked, by the time they get captions several days or so after release, they're out of date and not relevant anymore. 

Link to comment
Share on other sites

Link to post
Share on other sites

55 minutes ago, aeiro said:

I also wanted to point out that TechLinked episodes rarely get captions quickly. However, due to the nature of TechLinked, by the time they get captions several days or so after release, they're out of date and not relevant anymore. 

 

Unfortunately there's only two "not entirely unreasonable" ways around that, and they require changes on LMG's video pipeline to do

1) ASR - Automated Speech recognition. This has to be done straight off the microphone audio before any filtering is done. The best ASR however states a word-error rate around 5% which is about the same for humans.

2) ASR + Manual transcription for live broadcasts where the ASR puts captions into the video, and someone goes back and edits it later since ASR stumbles on voices that speak over each other.

 

Youtube's auto captions are basically ASR, but not trained on LMG's voices. So it will likely have that 5-7% WER on english, and worse in auto-translation (Speech Translation). If LMG wanted they could probably built box with a pair of Titan RTX's or similar to run ASR in-house so they could train it on their own voices, but that still requires running everything they produced through it to learn, and to get real-time captions is a bit of a crap-shoot still. 

 

Anyhow this gets back to why paying someone to manually transcribe live video might be cheaper, less effort and more accurate if the person transcribing the video actually knows the vernacular. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Kisai said:

1) ASR - Automated Speech recognition. This has to be done straight off the microphone audio before any filtering is done. The best ASR however states a word-error rate around 5% which is about the same for humans.

This is false. Humans can have WERs of 1% or even lower if they're sufficiently trained or experienced. A 5-7% WER for humans is typically only for untrained/inexperiencced people. In fact, I frequently make use of realtime stenocaptioners (AKA CART providers) in my job, and they are often able to achieve 98-99% accuracy in realtime, with highly technical/scientific terminology. I don't necessarily expect LTT to hire the best of the best, but it's clear that they invest high levels of money in their video, audio, and editing pipeline, so why not invest a (relatively) marginal amount of extra money into their captioning/accessibility pipeline?

2 minutes ago, Kisai said:

2) ASR + Manual transcription for live broadcasts where the ASR puts captions into the video, and someone goes back and edits it later since ASR stumbles on voices that speak over each other.

I strongly suspect (based on the consistent mistranscriptions that I see in their videos) that this is already the strategy they use. It's not a terrible strategy (especially for time constrained editing) but it's clear that whoever they've contracted to do it is not taking the time and care that LTT puts into the rest of their videos. 

2 minutes ago, Kisai said:

 

Youtube's auto captions are basically ASR, but not trained on LMG's voices. So it will likely have that 5-7% WER on english, and worse in auto-translation (Speech Translation). If LMG wanted they could probably built box with a pair of Titan RTX's or similar to run ASR in-house so they could train it on their own voices, but that still requires running everything they produced through it to learn, and to get real-time captions is a bit of a crap-shoot still. 

 

Anyhow this gets back to why paying someone to manually transcribe live video might be cheaper, less effort and more accurate if the person transcribing the video actually knows the vernacular. 

I don't want LTT to stop paying someone to do manual transcriptions; I still think that manual transcriptions are the best solution for almost every situation today. I just wish LTT would strive for the same level of quality for the transcriptions as they strive for the video and audio. I can just about guarantee that a skilled transcriptionist using the right pipeline would be capable of accurately transcribing the videos in less time than it takes to produce the final cut.

Link to comment
Share on other sites

Link to post
Share on other sites

34 minutes ago, aeiro said:

This is false. Humans can have WERs of 1% or even lower if they're sufficiently trained or experienced. A 5-7% WER for humans is typically only for untrained/inexperiencced people.

That needs to be highlighted, because the AI research suggests otherwise.

https://arxiv.org/ftp/arxiv/papers/1904/1904.12403.pdf

Quote

In this study, we used human transcribers and ASR systems to transcribe videoconferencing medical conversations. We found that the two manual transcriptions demonstrated similar quality with WER of 17.4%. This is higher than the WER of previous studies based on the standard telephone audio recording dataset where the manually transcribed WER was between 5.1% and 5.9%

 

 

34 minutes ago, aeiro said:

 

In fact, I frequently make use of realtime stenocaptioners (AKA CART providers) in my job, and they are often able to achieve 98-99% accuracy in realtime, with highly technical/scientific terminology. I don't necessarily expect LTT to hire the best of the best, but it's clear that they invest high levels of money in their video, audio, and editing pipeline, so why not invest a (relatively) marginal amount of extra money into their captioning/accessibility pipeline?

 

That is a good question, and perhaps it's just a question of getting people who know the vernacular and terminology. 

 

34 minutes ago, aeiro said:

I strongly suspect (based on the consistent mistranscriptions that I see in their videos) that this is already the strategy they use. It's not a terrible strategy (especially for time constrained editing) but it's clear that whoever they've contracted to do it is not taking the time and care that LTT puts into the rest of their videos. 

Well if Youtube's auto translations are about on par with untrained human translation (See linked article above) then that's probably it.

 

Personally I always run videos with subtitles on, regardless of the language.

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, Kisai said:

That needs to be highlighted, because the AI research suggests otherwise.

https://arxiv.org/ftp/arxiv/papers/1904/1904.12403.pdf

I'm not sure that study proves what you think it does. From the conclusion: 

Quote

We found that manual transcription significantly outperformed the automatic services...We posit that these findings could be generalized to other contexts.

I can't find anything showing that the WER of human transcriptionists are 5% or greater. It doesn't seem that the study goes into detail about it. In fact, they say 

 

Quote

We selected Manual CB as the reference transcript and completed a pairwise analysis for the remaining transcription services comparing the quality of all of the transcription services.

Which makes me think that they're confident in a very very low WER for humans. 

 

Besides, WER isn't the end-all-be-all of transcription quality. I haven't found an academic study that takes into account identification of different speakers, minor changes that have a big impact on the meaning ("I think this isn't stupid" vs "I think this is stupid" has two completely different meanings but has the same weight on WER as "I am going to be sad" vs "I am gonna be sad" which are functionally equivalent), or natural breaks/pauses in the subtitles. All of which humans do way better at than computers. 

 

LTT shouldn't be using AI for the subtitles for the same reason they don't use AI to edit videos or audio. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 months later...

Two of the recent videos still don't have captions, despite it being several days. Why does this keep happening? It's making it very frustrating to be a LTT follower when it's a grab bag whether the videos are inclusive and accessible or not. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

And the most recent video, while it does have captions added a while after uploading, about halfway through the captions are completely different from what is being said on screen. Given that I've been complaining about captioning quality issues since November and nobody seems to be listening, is there an alternative way of reporting video quality issues that is more effective?

 

I also want to point out that the verified gamer program was sold out prior to captions being added to the video. So it's kind of unfair for deaf folks like me to not even have the chance to participate. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×