Jump to content

Image Analysis Help

Go to solution Solved by vm'N,

You could shrink down the images to say, a thumbnail size, then do the comparison, using perceptual hashing.

More info here: http://dsp.stackexchange.com/questions/5995/what-algorithm-does-google-use-for-its-search-by-image-site

To start, i have to be a bit vague due to an NDA agreement i have. Im building a piece of software for real-estate, and cannot say much about its inner workings.

 

(shameless self-promotion: https://wovax.com/real-estate/)

 

I need to host images. A LOT of images. We are looking at 2000-3000 galleries of several HUNDRED images each.

The problem im having is this software is going to be sold to many different organizations, each with a different data standard. For example, some organizations upload a single large image of each "object", and some upload multiple sizes of each image to their MLS boards. This really sucks, because the software i have makes multiple copies of different sizes of each image. In some cases, this means making different sizes of the different sizes of images.

If a realtor has 35 images, times 5 sizes, that is 175 images. Take that, and i make an additional 4 sizes of each 175 images and you get 700 images. This does not sound too bad, until you realize that it is not uncommon for some realtors to have 50+ images, resulting in 1000+ images PER LISTING.

 

There is no easy way to tell if multiple image sizes are used based on the URL we are given. So, in PHP (or any other language i can pipe through CGI) what is a good method of extracting image key-points for comparison?

I would simply hash each image and compare that to the others for duplicates, but the different image sizes would make this impossible.

 

Any suggestions? Thanks in advance!

~Judah

Link to comment
https://linustechtips.com/topic/402987-image-analysis-help/
Share on other sites

Link to post
Share on other sites

You could shrink down the images to say, a thumbnail size, then do the comparison, using perceptual hashing.

More info here: http://dsp.stackexchange.com/questions/5995/what-algorithm-does-google-use-for-its-search-by-image-site

 

Thanks a ton!

I dont know how much i can really do. The computational power needed to analyze so many images would be pretty rough, but ill definitely give your suggestion a shot

~Judah

Link to comment
https://linustechtips.com/topic/402987-image-analysis-help/#findComment-5442843
Share on other sites

Link to post
Share on other sites

perceptual hashing

Sounds amazing

Thanks a ton!

I dont know how much i can really do. The computational power needed to analyze so many images would be pretty rough, but ill definitely give your suggestion a shot

Actually this should be quite lightweight. You only need to hash the image once and then store (and index) the hash together with the image url/id in your database.

When you insert a new image, you hash it and then check if there is already a similar hash in the db, which is a very fast lookup

I gotta try this thing out, sounds cool

Link to comment
https://linustechtips.com/topic/402987-image-analysis-help/#findComment-5443856
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×