Jump to content

c++ how to check for duplicates in array

Pachuca

Need help with a coding project. I'm making a bunch of random numbers into an array. I would like to check those numbers for duplicates and identify which numbers are duplicates and how many times they repeat. I"m having some trouble with this. Can anyone help pls?

 

edit: fixed code a bit

#include <iostream>
#include <cstdlib> 
#include <ctime>

using namespace std;

int main() {
	int numbers, arrayNumbers[35], arrayCheck[35], arrayDuplicates[35], i, j = 0, x = 0, count = 0;

	srand(time(0));

	for (i = 0; i < 35; i++) {
		numbers = rand() % 5 + 1;
		arrayNumbers[i] = numbers;
		arrayCheck[j] = arrayNumbers[i];
	//	j++;
	}
	for (j = 0; j < 35; j++) {
		if (arrayCheck[j] / arrayNumbers[j] == 1) {
			for (i = 0; i < 35; i++)
				arrayDuplicates[j] = arrayCheck[j];
			cout << "This is a duplicate number: " << arrayCheck[j] << " and it repeats this many times: " << arrayDuplicates[j] endl;
		}
	}
	return 0;
}

 

Link to comment
Share on other sites

Link to post
Share on other sites

What happens when you run the code?  Do you error out or does it ever cout?  I'm a C# guy,  so my logic might be a little off from yours but do you by chance get an Array out of bounds error.  I'm on my phone atm, so Im curious about the "<"  change that to =< (less than or = to) or what ever the C++ equivalent is...

 

before using

numbers = rand() % 5 + 1;

 

I would make all your numbers equal a single number to ensure your coding is working. So Numbers = 1.  This in theory should cout >> like crazy.

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, Pachuca said:

 



	for (i = 0; i < 35; i++) {
		numbers = rand() % 5 + 1;
		arrayNumbers[i] = numbers;
		arrayNumbers[i] = arrayCheck[i];
	}
	

 

You're overwriting the arrayNumbers with a null since arrayCheck. So that's at least one problem. 

 

 

26 minutes ago, Pachuca said:

 



	for (j = 0; j < 35; j++) {
		if (arrayCheck[j] / arrayNumbers = 1) {
			for(i = 0; i < 35; i++)
			arrayDuplicates[j] = arrayCheck[j];
			cout >> "This is a duplicate number: " << arrayCheck[j] << " and it repeats this many times: " << arrayDuplicates[j] endl;
		}
	}

You're dividing an int (arraycheck[j], by an int[]) in the if statement. 

 

Honestly, I would take a different approach and first fill the numbers array. Then sort it, and then iterate over the sorted array and just check if arrayNumbers=arrayNumbers[i+1] and then if it is iterate a counter, and then store/print the number and the count of that number.

 

 

 

int main(){

      int numbers[35] ; 

      for(int i=0; i<35; i++){

            numbers[i]=rand() % 5 + 1;

      }

      //sort the array

     int lastNum = numbers[0];

     int count = 1;

     for(int i=1; i<35; i++){

          if(numbers[i] == lastNum){

                 count++; 

           } else { 

                 printf("The number: %d appears %d times in the array.\n", numbers[i-1], count);

                 count=1; 

           }      

     }      

}

 

PSU Tier List | CoC

Gaming Build | FreeNAS Server

Spoiler

i5-4690k || Seidon 240m || GTX780 ACX || MSI Z97s SLI Plus || 8GB 2400mhz || 250GB 840 Evo || 1TB WD Blue || H440 (Black/Blue) || Windows 10 Pro || Dell P2414H & BenQ XL2411Z || Ducky Shine Mini || Logitech G502 Proteus Core

Spoiler

FreeNAS 9.3 - Stable || Xeon E3 1230v2 || Supermicro X9SCM-F || 32GB Crucial ECC DDR3 || 3x4TB WD Red (JBOD) || SYBA SI-PEX40064 sata controller || Corsair CX500m || NZXT Source 210.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, djdwosk97 said:

You're overwriting the arrayNumbers with a null since arrayCheck. So that's at least one problem. 

 

 

You're dividing an int (arraycheck[j], by an int[]) in the if statement. 

 

Honestly, I would take a different approach and first fill the numbers array. Then sort it, and then iterate over the sorted array and just check if arrayNumbers=arrayNumbers[i+1] and then if it is, store/print the number and the count of that number.

it should be

if (arrayCheck[j] / arrayNumbers[i] = 1) 

I was thinking it would check against each element in the array and if its a duplicate it would be divisible by itself so it should equal 1, otherwise its not a duplicate. Anyway it's not working for me so Ill try what you suggested.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Pachuca said:

it should be


if (arrayCheck[j] / arrayNumbers[i] = 1) 

I was thinking it would check against each element in the array and if its a duplicate it would be divisible by itself so it should equal 1, otherwise its not a duplicate. Anyway it's not working for me so Ill try what you suggested.

Wouldnt you get the same result from

If (arraycheck[j] = arrayNumbers[j])            ?  no need to divide.

 

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, Donny_Chen said:

Wouldnt you get the same result from

If (arraycheck[j] = arrayNumbers[j])            ?  no need to divide.

 

yes, but i was thinking to check each element against each other. It doesn't really matter, both ways would work. 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Pachuca said:

yes, but i was thinking to check each element against each other. It doesn't really matter, both ways would work. 

Except you're using the assignment operator "=" instead of the comparing operator "=="

 

EDIT: Maybe I should look at original post (>_<)

[Out-of-date] Want to learn how to make your own custom Windows 10 image?

 

Desktop: AMD R9 3900X | ASUS ROG Strix X570-F | Radeon RX 5700 XT | EVGA GTX 1080 SC | 32GB Trident Z Neo 3600MHz | 1TB 970 EVO | 256GB 840 EVO | 960GB Corsair Force LE | EVGA G2 850W | Phanteks P400S

Laptop: Intel M-5Y10c | Intel HD Graphics | 8GB RAM | 250GB Micron SSD | Asus UX305FA

Server 01: Intel Xeon D 1541 | ASRock Rack D1541D4I-2L2T | 32GB Hynix ECC DDR4 | 4x8TB Western Digital HDDs | 32TB Raw 16TB Usable

Server 02: Intel i7 7700K | Gigabye Z170N Gaming5 | 16GB Trident Z 3200MHz

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, DeadEyePsycho said:

Except you're using the assignment operator "=" instead of the comparing operator "=="

 

EDIT: Maybe I should look at original post (>_<)

Even with the code he has that wouldn't work. He's setting arraynumber to be a random number, then setting arraycheck[j] to that same number (and then never iterating j), so arraycheck[0] will consist of a single int and it will match whatevers in arraynumbers[34]. What he should do is not use arrayCheck at all and use nested for loops if he doesn't want to sort the array. 

 

 


for(int i=0; i<size(arrayNumber); i++}{

     for(int j=i+1; j<size(arrayNumber); j++){

            if(arrayNumber[i]==arrayNumber[j])

                 // increment the count for that number

                 // set arrayNumber[j] = NULL

     }

}

 

 

If I wasn't going to sort the array, then I'd use the numbers array, a unique numbers array, and a count array (or just a 2D array that's 35x2 where the first column would be the number and the second column the count). Set all of the numbers in my number array. Then iterate over the numbers array and add the first element to my unique array and initialize my count array to 1. Then iterate over the numbers array and for any duplicate of the i'th number I would increment the count in the count array (and set the value of the ith element in the numbers array to NULL so I wouldn't ever add it to my unique array. 

PSU Tier List | CoC

Gaming Build | FreeNAS Server

Spoiler

i5-4690k || Seidon 240m || GTX780 ACX || MSI Z97s SLI Plus || 8GB 2400mhz || 250GB 840 Evo || 1TB WD Blue || H440 (Black/Blue) || Windows 10 Pro || Dell P2414H & BenQ XL2411Z || Ducky Shine Mini || Logitech G502 Proteus Core

Spoiler

FreeNAS 9.3 - Stable || Xeon E3 1230v2 || Supermicro X9SCM-F || 32GB Crucial ECC DDR3 || 3x4TB WD Red (JBOD) || SYBA SI-PEX40064 sata controller || Corsair CX500m || NZXT Source 210.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, djdwosk97 said:

Even with the code he has that wouldn't work. He's setting arraynumber to be a random number, then setting arraycheck[j] to that same number (and then never iterating j), so arraycheck[0] will consist of a single int and it will match whatevers in arraynumbers[34]. What he should do is not use arrayCheck at all and use nested for loops if he doesn't want to sort the array. 

 

 


for(int i=0; i<size(arrayNumber); i++}{

     for(int j=i+1; j<size(arrayNumber); j++){

            if(arrayNumber[i]==arrayNumber[j])

                 // increment the count for that number

                 // set arrayNumber[j] = NULL

     }

}

 

 

If I wasn't going to sort the array, then I'd use the numbers array, a unique numbers array, and a count array (or just a 2D array that's 35x2 where the first column would be the number and the second column the count). Set all of the numbers in my number array. Then iterate over the numbers array and add the first element to my unique array and initialize my count array to 1. Then iterate over the numbers array and for any duplicate of the i'th number I would increment the count in the count array (and set the value of the ith element in the numbers array to NULL so I wouldn't ever add it to my unique array. 

Even if you dont want the array to get sorted, you could just create a copy and sort that.

This is more elegant and has a better complexity (copy=O(N), sort=O(N log N)).

 

Imho the simplest method would be to loop over the array and store how many times each number occurs in a std::unordered_map.

This still has O(N log N) complexity (although the sorting method will perform better in practice because of how std::unordered_map has to be implemented).

I dont have access to my laptop right now but I can write a code sample later tonight.

 

EDIT:

#include <iostream>
#include <random>
#include <array>
#include <algorithm>
#include <unordered_map>

int main()
{
	// Allocate an array of 35 integer values on the stack
	std::array<int, 35> randomValues;

	// Use a random number generator "device" to generate a seed for the Mersenne Twister Psuedo Random Number Generator.
	// Then fill the array of random values with a uniform distribution in the range of 1 to 5 (sampling from the MT PRNG).
	std::random_device rd;
	std::mt19937 mt(rd());
	std::uniform_int_distribution<int> dist(1, 5);
	std::generate(randomValues.begin(), randomValues.end(), [&]() { return dist(mt); });

	std::cout << "Result using sorting:" << std::endl;
	// Option 1
	// Make a copy of the random numbers, sort them, and loop over the sorted results while counting the number of occurences.
	std::array<int, 35> sortedRandomValues;
	std::copy(randomValues.begin(), randomValues.end(), sortedRandomValues.begin());
	std::sort(sortedRandomValues.begin(), sortedRandomValues.end());
	int prevValue = -1, occurences = 0;
	for (int value : sortedRandomValues)
	{
		if (value == prevValue)
		{
			occurences++;
		} else {
			if (occurences > 1)
				std::cout << "This is a duplicate number: " << prevValue << " and it repeats this many times: " << occurences << std::endl;
			occurences = 1;
			prevValue = value;
		}
	}
	if (occurences > 1)
		std::cout << "This is a duplicate number: " << prevValue << " and it repeats this many times: " << occurences << std::endl;




	std::cout << std::endl << "Result using a hashmap" << std::endl;
	// Option 2
	// Loop through the numbers and store them in a hashmap. A hash map is a dictionary mapping a key to a value.
	// The random value is the key and the number of occurences is the value.
	std::unordered_map<int, int> occurenceCount;
	for (int value : randomValues)
	{
		auto& entry = occurenceCount[value];
		if (entry)
			entry++;
		else
			entry = 1;
	}

	for (auto pair : occurenceCount)
		if (pair.second > 1)
			std::cout << "This is a duplicate number: " << pair.first << " and it repeats this many times: " << pair.second << std::endl;

    return 0;
}

I feel like the loop in the sort version could be more elegant (do while maybe?).

Personally, I like the second option the most. It is simple, elegant and relatively fast.

Note that sorting will always return the random numbers in increasing order.

If you want the same behavior from the hash map version than you'll have to replace std::unordererd_map by std::map (which I think is not actually a hash map but a binary search tree, correct me if Im wrong).

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

It would seem simple to me.

 

1. Have the array with the numbers

2. Have a second array which holds the number of times the number at a particular position in the first array is found in the array

3. Every time you detect the number a second time in the array, replace that number with some value you know for example -1 (since it seems you're using signed variables but only positive random numbers) and you'll skip searching for duplicates for that number when its time comes

 

Here's something i wrote super quick in php the last two minutes, should be fairly easy to convert to c++  :

 

<?php

$numbercount = 49; // from 0 to 49 = 50 in total

$numbers = array(); // holds the numbers
$stats = array();  // holds the number of times each number is detected

// generate some random numbers

for ($i=0;$i<=$numbercount;$i++) {
    $numbers[$i] = mt_rand(0,100); // number between 0 and 100 inclusive
    $stats[$i] = 1; // this number is at least one in the array
}

// let's make sure there are some duplicates

$numbers[15] = $numbers[3];
$numbers[20] = $numbers[5];
$numbers[21] = $numbers[5];

// just print the numbers for debugging purposes 
for ($i=0;$i<=$numbercount;$i++) { echo $numbers[$i]." "; }
echo "\n";


$numbers_duplicate = 0;
for ($i=0;$i<=$numbercount-1;$i++) {
if ($numbers[$i]!==-1) {
        for ($j=$i+1;$j<=$numbercount;$j++) {
            if ($numbers[$i]==$numbers[$j]) {
                if ($stats[$i]==1) { $numbers_duplicate++; }
                $stats[$i]++; 
                $numbers[$j] = -1;
            }
        }
    }
}

// show the array again to see the duplicate numbers gone 

for ($i=0;$i<=$numbercount;$i++) { echo $numbers[$i]." "; }
echo "\n";

if ($numbers_duplicate > 0 )  {
    echo "There are $numbers_duplicate numbers to show: \n";
    for ($i=0;$i<=$numbercount;$i++) {
        if ($stats[$i]>1) echo $numbers[$i].': '.$stats[$i]."\n";
    }
} else {
    echo "There are no duplicate numbers to show.\n";
}
?>

 

Example of output for this php code:

 

"D:\Programs\php7\php.exe" "D:\Programs\php\file.php"
41 84 56 44 78 88 80 38 3 18 37 14 85 79 97 44 81 16 17 95 88 88 60 88 35 92 26 62 43 1 13 88 73 12 92 9 20 80 72 6 39 62 75 83 85 67 10 38 43 80 
41 84 56 44 78 88 80 38 3 18 37 14 85 79 97 -1 81 16 17 95 -1 -1 60 -1 35 92 26 62 43 1 13 -1 73 12 -1 9 20 -1 72 6 39 -1 75 83 -1 67 10 -1 -1 -1 
There are 8 numbers to show: 
44: 2
88: 5
80: 3
38: 2
85: 2
92: 2
62: 2
43: 2
Done.

(the 3rd and 5th number starting with 0 in array will be duplicates every time, because i said so in the code)

 

Of course, this assumes you can mess with the array and replace some values with -1. But instead of replacing the value in that array, you could just change the value in stats array to -1 (or whatever)

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, mathijs727 said:

<snip>

Thumbs up for using the random library. The unordered_map approach is the canonical way of doing this, although in many cases a simple array is used instead. For a really large dataset sorting the data first and then applying the unordered_map approach might be faster. That's because sorting is pretty linear (some algorithms at least) and thus cache-friendly; and after sorting, the accesses to the unordered_map will also be linear, whereas without sorting those accesses would be all over the place, potentially leading to a lot of cache-misses. The only way to be sure is to test it I guess.

 

11 hours ago, Pachuca said:

it should be


if (arrayCheck[j] / arrayNumbers[i] = 1) 

I was thinking it would check against each element in the array and if its a duplicate it would be divisible by itself so it should equal 1, otherwise its not a duplicate. Anyway it's not working for me so Ill try what you suggested.

The first problem with that is '=' is the assignment operator. This code tries to assign the literal value 1 to the r-value result of 'arrayCheck[j] / arrayNumbers[ i] ' - which should result in a compiler error along the lines of "l-value required as left operand" or "cannot assign to r-value".

'==' is the quality operator, so to compare 2 things are equal:

if (x == 1)
{
	//Launch missiles.
}

Secondly, dividing 2 numbers and then checking if the result is 1 is a rather convoluted way of comparison, and has some problems:

  • What if arrayNumbers[ i] is zero ? (Perhaps not possible here, but in general).
  • A integer can only hold whole numbers, so 3 / 2 is rounded down to 1 and would pass your "equality" test.
  • If those numbers were floating point then the result would be floating point as well. Not all numbers can be accurately represented in floating point format, resulting in a nearby approximation which will fail any precise comparison to a literal. For this reason floating point numbers are compared to a range (bigger then x, but smaller then y).

 

 

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Unimportant said:

Thumbs up for using the random library. The unordered_map approach is the canonical way of doing this, although in many cases a simple array is used instead. For a really large dataset sorting the data first and then applying the unordered_map approach might be faster. That's because sorting is pretty linear (some algorithms at least) and thus cache-friendly; and after sorting, the accesses to the unordered_map will also be linear, whereas without sorting those accesses would be all over the place, potentially leading to a lot of cache-misses. The only way to be sure is to test it I guess.

 

The first problem with that is '=' is the assignment operator. This code tries to assign the literal value 1 to the r-value result of 'arrayCheck[j] / arrayNumbers[ i] ' - which should result in a compiler error along the lines of "l-value required as left operand" or "cannot assign to r-value".

'==' is the quality operator, so to compare 2 things are equal:


if (x == 1)
{
	//Launch missiles.
}

Secondly, dividing 2 numbers and then checking if the result is 1 is a rather convoluted way of comparison, and has some problems:

  • What if arrayNumbers[ i] is zero ? (Perhaps not possible here, but in general).
  • A integer can only hold whole numbers, so 3 / 2 is rounded down to 1 and would pass your "equality" test.
  • If those numbers were floating point then the result would be floating point as well. Not all numbers can be accurately represented in floating point format, resulting in a nearby approximation which will fail any precise comparison to a literal. For this reason floating point numbers are compared to a range (bigger then x, but smaller then y).

 

 

 

 

 

 

thanks, already been through this with someone else. I could change it to a double, but that would just be bad code and I like to keep things simple so I will use arrayCheck[j] == arrayNumbers.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Unimportant said:

Thumbs up for using the random library. The unordered_map approach is the canonical way of doing this, although in many cases a simple array is used instead. For a really large dataset sorting the data first and then applying the unordered_map approach might be faster. That's because sorting is pretty linear (some algorithms at least) and thus cache-friendly; and after sorting, the accesses to the unordered_map will also be linear, whereas without sorting those accesses would be all over the place, potentially leading to a lot of cache-misses. The only way to be sure is to test it I guess.

 

The first problem with that is '=' is the assignment operator. This code tries to assign the literal value 1 to the r-value result of 'arrayCheck[j] / arrayNumbers[ i] ' - which should result in a compiler error along the lines of "l-value required as left operand" or "cannot assign to r-value".

'==' is the quality operator, so to compare 2 things are equal:


if (x == 1)
{
	//Launch missiles.
}

Secondly, dividing 2 numbers and then checking if the result is 1 is a rather convoluted way of comparison, and has some problems:

  • What if arrayNumbers[ i] is zero ? (Perhaps not possible here, but in general).
  • A integer can only hold whole numbers, so 3 / 2 is rounded down to 1 and would pass your "equality" test.
  • If those numbers were floating point then the result would be floating point as well. Not all numbers can be accurately represented in floating point format, resulting in a nearby approximation which will fail any precise comparison to a literal. For this reason floating point numbers are compared to a range (bigger then x, but smaller then y).

 

 

 

 

 

 

After sorting, there is no need to using a hashmap anymore.

 

Also, unordered_map is slow/cache unfriendly because the standard requires access to buckets.

This practically forces the implementation to use a linked list chain for conflicting hashes.

Linked lists are obviously very bad for cache coherency, and even if there are no conflicting hashes, you now have to go through one extra indirection.

If you really care about performance than you probably need to look for a library that provides a hashmap with open addressing.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Tell me, what is the point of using `j` in your first loop if it's always 0 within that loop? You have some code commented out after that (`j++;`), but this would effectively make `j` redundant as `j` increments at the same interval as `i`, contains the same initial value and thus will contain the same value as `i` throughout your loop. You would basically be making a copy of `arrayNumbers` which is named `arrayCheck`, except you've merged the copy operation with the initial population.

I think you should think of what else you can use, other than `i` and `j`, as an index for `arrayCheck` in this statement:

arrayCheck[j] = arrayNumbers[i];



Also, think about what else you can use as the right hand side... or more precisely, what you intended to use, because as it currently stands there's no reasonable logic explaining why you need to make an identical copy of your original array to solve this problem. To be clear:

arrayCheck[/* put something other than j here */] = /* put something other than arrayNumbers[i] here */;

Link to comment
Share on other sites

Link to post
Share on other sites

The second loop doesn't make any logical sense, either. It seems like you're just guessing!

Is there a more pertinent question on your mind, perhaps?

Unfortunately, I can't read minds, so I'll have to assume you're confused about everything (sorry if that seems offensive).

The naive approach at finding duplicates is as a nested loop; you have an outer loop and an inner loop.

The outer loop should iterate an integer index from 0 to the end of the collection. Let's call that integer index x.

The inner loop should iterate an integer index from x+1 to the end of the collection. Let's call this one y.

The outer shouldn't contain any more than your inner loop; these loops are literally nested one on top of the other with no logic between the starts and ends of both loops, like:
 

for (size_t x = 0; x < sizeof array / sizeof *array; x++) {

    for (size_t y = x + 1; y < sizeof array / sizeof *array; y++) {
        if (array[x] == array[y]) {
            freq[array[x]]++;
            break;
        }
    }

}



That's about all your second loop should contain. You could turn this into one loop, at your peril. Probably best to leave it nested as two.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×