Jump to content

Image Analysis Help

Judahnator
Go to solution Solved by vm'N,

You could shrink down the images to say, a thumbnail size, then do the comparison, using perceptual hashing.

More info here: http://dsp.stackexchange.com/questions/5995/what-algorithm-does-google-use-for-its-search-by-image-site

To start, i have to be a bit vague due to an NDA agreement i have. Im building a piece of software for real-estate, and cannot say much about its inner workings.

 

(shameless self-promotion: https://wovax.com/real-estate/)

 

I need to host images. A LOT of images. We are looking at 2000-3000 galleries of several HUNDRED images each.

The problem im having is this software is going to be sold to many different organizations, each with a different data standard. For example, some organizations upload a single large image of each "object", and some upload multiple sizes of each image to their MLS boards. This really sucks, because the software i have makes multiple copies of different sizes of each image. In some cases, this means making different sizes of the different sizes of images.

If a realtor has 35 images, times 5 sizes, that is 175 images. Take that, and i make an additional 4 sizes of each 175 images and you get 700 images. This does not sound too bad, until you realize that it is not uncommon for some realtors to have 50+ images, resulting in 1000+ images PER LISTING.

 

There is no easy way to tell if multiple image sizes are used based on the URL we are given. So, in PHP (or any other language i can pipe through CGI) what is a good method of extracting image key-points for comparison?

I would simply hash each image and compare that to the others for duplicates, but the different image sizes would make this impossible.

 

Any suggestions? Thanks in advance!

~Judah

Link to comment
Share on other sites

Link to post
Share on other sites

You could shrink down the images to say, a thumbnail size, then do the comparison, using perceptual hashing.

More info here: http://dsp.stackexchange.com/questions/5995/what-algorithm-does-google-use-for-its-search-by-image-site

 

Thanks a ton!

I dont know how much i can really do. The computational power needed to analyze so many images would be pretty rough, but ill definitely give your suggestion a shot

~Judah

Link to comment
Share on other sites

Link to post
Share on other sites

perceptual hashing

Sounds amazing

Thanks a ton!

I dont know how much i can really do. The computational power needed to analyze so many images would be pretty rough, but ill definitely give your suggestion a shot

Actually this should be quite lightweight. You only need to hash the image once and then store (and index) the hash together with the image url/id in your database.

When you insert a new image, you hash it and then check if there is already a similar hash in the db, which is a very fast lookup

I gotta try this thing out, sounds cool

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×