Jump to content

Depends on which model you'll base your own model on, what kind of hardware are you using to train, and (to a minor extent) your hyperparameters.

 

Also, 16500 pictures is too small of a dataset to properly train a reasonable working model, unless you can get a base model that's almost close to what you want and use your dataset to finetune it.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

I have been doing machine learning things with satellite imagery for the last 4 years. We overall use a lot less pictures since we work with multi spectral imagery (meaning more than RGB) plus indices and work with object based or pixel based classification methods so we don't compare one picture to the next. So it maybe not be comparable to your work. I usually use R or python for that and RandomForest or some NeuralNetwork Models. Depending on how complex your model is and how fast your machine is training that can take a lot of time. Just recently worked with UAV pictures and training a RandomForest model (with very high accuracy) for that took about 28 hours .

Desktop: i9-10850K [Noctua NH-D15 Chromax.Black] | Asus ROG Strix Z490-E | G.Skill Trident Z 2x16GB 3600Mhz 16-16-16-36 | Asus ROG Strix RTX 3080Ti OC | SeaSonic PRIME Ultra Gold 1000W | Samsung 970 Evo Plus 1TB | Samsung 860 Evo 2TB | CoolerMaster MasterCase H500 ARGB | Win 10

Display: Samsung Odyssey G7A (28" 4K 144Hz)

 

Laptop: Lenovo ThinkBook 16p Gen 4 | i7-13700H | 2x8GB 5200Mhz | RTX 4060 | Linux Mint 21.2 Cinnamon

Link to post
Share on other sites

On 12/20/2021 at 2:54 PM, igormp said:

Also, 16500 pictures is too small of a dataset to properly train a reasonable working model

I don't work in the medical field and have zero knowledge of medical on that level but the biggest factor for training a model usually is how different the classes you want to predict are. Looking at it from my perspective: its like putting all the pictures in a mosaic and treating them like individual pixels. So if the pixel with tuberculosis looks very much different than a healthy pixel it doesn't take all that much to get a fairly good working model. Not to long ago I had a model with one class (which had an overall coverage of 15-20% of all pixels of the image) that was able to be predicted to about 90% just using 50 pixels of an image with more than 450000 pixels.

Desktop: i9-10850K [Noctua NH-D15 Chromax.Black] | Asus ROG Strix Z490-E | G.Skill Trident Z 2x16GB 3600Mhz 16-16-16-36 | Asus ROG Strix RTX 3080Ti OC | SeaSonic PRIME Ultra Gold 1000W | Samsung 970 Evo Plus 1TB | Samsung 860 Evo 2TB | CoolerMaster MasterCase H500 ARGB | Win 10

Display: Samsung Odyssey G7A (28" 4K 144Hz)

 

Laptop: Lenovo ThinkBook 16p Gen 4 | i7-13700H | 2x8GB 5200Mhz | RTX 4060 | Linux Mint 21.2 Cinnamon

Link to post
Share on other sites

8 minutes ago, Montana16 said:

I don't work in the medical field and have zero knowledge of medical on that level but the biggest factor for training a model usually is how different the classes you want to predict are. Looking at it from my perspective: its like putting all the pictures in a mosaic and treating them like individual pixels. So if the pixel with tuberculosis looks very much different than a healthy pixel it doesn't take all that much to get a fairly good working model. Not to long ago I had a model with one class (which had an overall coverage of 15-20% of all pixels of the image) that was able to be predicted to about 90% just using 50 pixels of an image with more than 450000 pixels.

I have no idea about OP's dataset (resolution, how it's balanced and whatnot), but it also depends on if they're using a pre-trained model (in which case you'd just finetune it for your needs) or building one from scratch (then you'd need way more data).

 

Using way too few data might lead your model to overfit on your values, which may not be a problem and be what you want if you data has low variance. However, for something such as medical issues, you want not only high accuracy, but also need to keep in mind the false positives and negatives or you might risk a really wrong diagnosis.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

2 minutes ago, igormp said:

I have no idea about OP's dataset (resolution, how it's balanced and whatnot), but it also depends on if they're using a pre-trained model (in which case you'd just finetune it for your needs) or building one from scratch (then you'd need way more data).

 

Using way too few data might lead your model to overfit on your values, which may not be a problem and be what you want if you data has low variance. However, for something such as medical issues, you want not only high accuracy, but also need to keep in mind the false positives and negatives or you might risk a really wrong diagnosis.

Like I said I was just projecting my knowledge with Image based classification from remote sensing onto that, which probably isn't all that comparable.

 

Yeah same to me. It sounded like he has 16500 pictures of tuberculosis lungs. So he just needs pictures of healthy lungs and he could make a model that should be able to produce fairly good results with these two classes (healthy and tuberculosis).

False positives/negatives I was also thinking about. I mean since there are a lot of diseases or things affecting the lungs you probably would need to classify into a shit-ton of different classes. I have seen some research papers where they classified things like that but never really used that many different classes.

Desktop: i9-10850K [Noctua NH-D15 Chromax.Black] | Asus ROG Strix Z490-E | G.Skill Trident Z 2x16GB 3600Mhz 16-16-16-36 | Asus ROG Strix RTX 3080Ti OC | SeaSonic PRIME Ultra Gold 1000W | Samsung 970 Evo Plus 1TB | Samsung 860 Evo 2TB | CoolerMaster MasterCase H500 ARGB | Win 10

Display: Samsung Odyssey G7A (28" 4K 144Hz)

 

Laptop: Lenovo ThinkBook 16p Gen 4 | i7-13700H | 2x8GB 5200Mhz | RTX 4060 | Linux Mint 21.2 Cinnamon

Link to post
Share on other sites

2 hours ago, Montana16 said:

I have seen some research papers where they classified things like that but never really used that many different classes.

There's a nice paper showing how most of those other papers that try to identify covid are useless due to flawed methodology:

https://www.nature.com/articles/s42256-021-00307-0

 

Quote

Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases.

 

A similar story applies to other models that focus on medical imaging.

1 false negative and a person dies due to a disease, 1 false positive and you sent a healthy person through a myriad of other procedures.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×