AI training time estimation

Wictorian · December 20, 2021

So I want to train an ai to detect tuberculosis on lung films. I have a dataset of around 16.500 pictures. My question is, can you estimate how long it would take to train the ai?

igormp · December 20, 2021

Depends on which model you'll base your own model on, what kind of hardware are you using to train, and (to a minor extent) your hyperparameters.

Also, 16500 pictures is too small of a dataset to properly train a reasonable working model, unless you can get a base model that's almost close to what you want and use your dataset to finetune it.

Montana One-Six · December 21, 2021

I have been doing machine learning things with satellite imagery for the last 4 years. We overall use a lot less pictures since we work with multi spectral imagery (meaning more than RGB) plus indices and work with object based or pixel based classification methods so we don't compare one picture to the next. So it maybe not be comparable to your work. I usually use R or python for that and RandomForest or some NeuralNetwork Models. Depending on how complex your model is and how fast your machine is training that can take a lot of time. Just recently worked with UAV pictures and training a RandomForest model (with very high accuracy) for that took about 28 hours .

Montana One-Six · December 21, 2021

On 12/20/2021 at 2:54 PM, igormp said:

Also, 16500 pictures is too small of a dataset to properly train a reasonable working model

I don't work in the medical field and have zero knowledge of medical on that level but the biggest factor for training a model usually is how different the classes you want to predict are. Looking at it from my perspective: its like putting all the pictures in a mosaic and treating them like individual pixels. So if the pixel with tuberculosis looks very much different than a healthy pixel it doesn't take all that much to get a fairly good working model. Not to long ago I had a model with one class (which had an overall coverage of 15-20% of all pixels of the image) that was able to be predicted to about 90% just using 50 pixels of an image with more than 450000 pixels.

igormp · December 21, 2021

8 minutes ago, Montana16 said:

I don't work in the medical field and have zero knowledge of medical on that level but the biggest factor for training a model usually is how different the classes you want to predict are. Looking at it from my perspective: its like putting all the pictures in a mosaic and treating them like individual pixels. So if the pixel with tuberculosis looks very much different than a healthy pixel it doesn't take all that much to get a fairly good working model. Not to long ago I had a model with one class (which had an overall coverage of 15-20% of all pixels of the image) that was able to be predicted to about 90% just using 50 pixels of an image with more than 450000 pixels.

I have no idea about OP's dataset (resolution, how it's balanced and whatnot), but it also depends on if they're using a pre-trained model (in which case you'd just finetune it for your needs) or building one from scratch (then you'd need way more data).

Using way too few data might lead your model to overfit on your values, which may not be a problem and be what you want if you data has low variance. However, for something such as medical issues, you want not only high accuracy, but also need to keep in mind the false positives and negatives or you might risk a really wrong diagnosis.

Montana One-Six · December 21, 2021

2 minutes ago, igormp said:

I have no idea about OP's dataset (resolution, how it's balanced and whatnot), but it also depends on if they're using a pre-trained model (in which case you'd just finetune it for your needs) or building one from scratch (then you'd need way more data).

Using way too few data might lead your model to overfit on your values, which may not be a problem and be what you want if you data has low variance. However, for something such as medical issues, you want not only high accuracy, but also need to keep in mind the false positives and negatives or you might risk a really wrong diagnosis.

Like I said I was just projecting my knowledge with Image based classification from remote sensing onto that, which probably isn't all that comparable.

Yeah same to me. It sounded like he has 16500 pictures of tuberculosis lungs. So he just needs pictures of healthy lungs and he could make a model that should be able to produce fairly good results with these two classes (healthy and tuberculosis).

False positives/negatives I was also thinking about. I mean since there are a lot of diseases or things affecting the lungs you probably would need to classify into a shit-ton of different classes. I have seen some research papers where they classified things like that but never really used that many different classes.

igormp · December 21, 2021

2 hours ago, Montana16 said:

I have seen some research papers where they classified things like that but never really used that many different classes.

There's a nice paper showing how most of those other papers that try to identify covid are useless due to flawed methodology:

https://www.nature.com/articles/s42256-021-00307-0

Quote

Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases.

A similar story applies to other models that focus on medical imaging.

1 false negative and a person dies due to a disease, 1 false positive and you sent a healthy person through a myriad of other procedures.

Sign In

AI training time estimation

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI