Jump to content

theoretical performance of image recognition

OhYou_

Just wondering if anyone has a rough estimate of the expected performance when doing image recognition in video.
the video resolution I think would be good is 2MP and I am wondering how much FPS worth of processing I can get out of say an rtx 4090 or similar high end cpu if that is better.


I have the idea because I got to work with a few optical based sorting systems, and they are doing some archaic method of comparing pixel intensity and grouping sizes to determine good from bad.
Surely in 2024 this can be replaced with a bit of machine learning at a far higher resolution and speed.
like if it can process through 120 frames at 2MP each a second doing basic image recognition, that would be way better.

Link to comment
Share on other sites

Link to post
Share on other sites

It really depends on what type of image recognition you're looking to do and with what model/training data.

2 hours ago, OhYou_ said:

Surely in 2024 this can be replaced with a bit of machine learning at a far higher resolution and speed.

Actually not necessarily. Finding and scanning, say, a barcode is much easier and faster using traditional computer vision. As a rule of thumb, if you know exactly how to describe what you're looking for then computer vision is faster and more reliable than machine learning.

2 hours ago, OhYou_ said:

like if it can process through 120 frames at 2MP each a second doing basic image recognition, that would be way better.

There are industrial vision systems that can achieve this, of course not necessarily on the cheap. Your pc is most likely capable of the same or better, though you'd need specialized software and capture hardware to do it reliably on a live feed (most cameras will return an already encoded and compressed feed, which isn't what you want for computer vision).

 

There are also industrial systems that use some form of machine learning, primarily for tasks where the thing you're looking for could have significant variations. Of course a 4090 could have a much higher throughput but again, specialized capture hardware... it's often just unnecessary and a waste of energy to have that much processing power in an industrial sorting application where the main bottleneck is the capture speed anyway.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Sauron said:

if you know exactly how to describe what you're looking for then computer vision is faster and more reliable than machine learning

That's the thing, I only know what is good...
the higher fps is just my idea that I can keep the image recognition do a relatively simple level and just spam it with tons of frames. if a bunch of surveillance cams can depict people from other objects, surely I can do that with product vs FM.
I'm not really sure if the cameras need to be too specialized, any mjpeg feed is fine since it can be broken into individual frames very quickly and the software just processes each frame as a static image.


I just am not sure how many frames per second is reasonable or how fast I can generally expect a decision. Maybe I am under-thinking it and its better to cut up the image into a ton of chunks and work on each chunk separately with a little bit of overlap, but that raises the workload significantly.
the working frame I was imagining to be 8000x256 or 2 pixels/1mm

 

Say for example a very simple task of only allow blue circles to pass, how hard is that when you want it to locate the background, and anything that isnt a circle. Then ignore any overlapping blue circles, or ones falling at an angle too. That way you can throw anything else in with the blue circles and as they pass through the imaging zone, ML can pick out every object it determines isnt the background or a blue circle.
it's something I dont think normal computer vision can do

 

it's more of just theoretical, I'm not really looking to solve the problem yet but more get an understanding if going this direction is something to bring up

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, OhYou_ said:

That's the thing, I only know what is good...
the higher fps is just my idea that I can keep the image recognition do a relatively simple level and just spam it with tons of frames. if a bunch of surveillance cams can depict people from other objects, surely I can do that with product vs FM.
I'm not really sure if the cameras need to be too specialized, any mjpeg feed is fine since it can be broken into individual frames very quickly and the software just processes each frame as a static image.

As I mentioned compression is bad here because you might lose important detail. I'm not sure what you're looking to sort but many product types can be easily sorted by color, shape, size or position using simple computer vision.

16 minutes ago, OhYou_ said:

Maybe I am under-thinking it and its better to cut up the image into a ton of chunks and work on each chunk separately with a little bit of overlap, but that raises the workload significantly.

That's actually more or less how a convolutional neural network works.

17 minutes ago, OhYou_ said:

Say for example a very simple task of only allow blue circles to pass, how hard is that when you want it to locate the background, and anything that isnt a circle.

Very easy with normal vision.

17 minutes ago, OhYou_ said:

Then ignore any overlapping blue circles, or ones falling at an angle too.

Not very easy with vision or with AI since you could never be sure whether the hidden parts actually make a circle or not. You'd end up with poor reliability.

 

However, if you wanted to sort miscellaneous animal toys an AI system would probably be better than trying to manually explain to the computer what a "cat" is.

20 minutes ago, OhYou_ said:

it's more of just theoretical, I'm not really looking to solve the problem yet but more get an understanding if going this direction is something to bring up

There's a lot of research on this topic because it is quite useful. There are use cases for both traditional computer vision and machine learning, one is not necessarily always better than the other.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, OhYou_ said:

Say for example a very simple task of only allow blue circles to pass, how hard is that when you want it to locate the background, and anything that isnt a circle. Then ignore any overlapping blue circles, or ones falling at an angle too. That way you can throw anything else in with the blue circles and as they pass through the imaging zone, ML can pick out every object it determines isnt the background or a blue circle.

it's something I dont think normal computer vision can do

You could sort the pixels of the color you want then use a multi dimensional array to sort groupings of those pixels. You could then determine distance/angle by the amount of pixels and their movement relative to previous frames, or other context clues on environment such as luminosity.

 

This is a very basic solution to the problem, but your example would be (relatively) trivial to do.

Link to comment
Share on other sites

Link to post
Share on other sites

It shouldn't be necessary to have crazy high resolution images to do image recognition, and in fact, it's usually less optimized as you go above HD resolutions. CV Object Detection is generally used in a relatively controlled environment, which means that the models you train it on can be tailored to your needs, and the benefits of processing at a higher framerate outweighs the marginal accuracy gains you get from higher resolutions.

 

That said, there has been plenty of experiments/benchmarking of general Object Detection solutions on various hardware and resolutions. On an RTX 4090, you could get 200+ FPS with HD resolutions: https://learnopencv.com/performance-comparison-of-yolo-models/

 

Just an educated guess (I have not read up on any literature about this), but I would imagine that the benefits of faster hardware has pushed OD models to become more sophisticated and able to identify more objects more accurately, rather that pushing them to identify them faster or at higher detail. Pushing beyond HD resolutions and hundreds of FPS just isn't as practical as identifying more types of objects more better.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×