Jump to content

Is there a way to make this code even faster?(C++)

pipnina

I'm making a prime number generator, my lecturer has said that the fastest code gets £20 (GBP) of his own money (With the catch that he's also in the contest).

I think my code is pretty quick, but wondered if there were any further ways to optimize it.

 

    //Determining if a number is prime
    for(int primeSearch=3; primeSearch<COUNT_TO; primeSearch+=2){  //Because we skip even numbers, we know we don't need to test the prime candidates by dividing them by two or ten.
        bool primeFound = false;
        for(unsigned int vectorScan=0; vectorScan<=primeVector.size(); vectorScan++){                   //For every prime number in the vector:
            if(primeSearch%primeVector[vectorScan]==0 && primeSearch!=primeVector[vectorScan]){break;}  //Make sure that the candidate is not divisible by any of the previous primes.
            if(primeVector[vectorScan]>(primeSearch/2)){                                                //If the prime number isn't disproved by the time we've started dividing by-
                primeFound=true;                                                                        //-more than half of the candidate's value, we assume it's prime.
                break;
            }
        }
        if(primeFound == true){
            primeVector.push_back(primeSearch);                                                        //Add the prime number to the list
        }
    }

Currently it executes between 100 and 200 milliseconds when generating prime numbers below a value of 100'000 (In release mode, on a 6700k)

 

Thanks :)

    ~pip

Link to comment
Share on other sites

Link to post
Share on other sites

I'm thinking multithreading, but I can't help code-wise in this case

Computer Case: NZXT S340 || CPU: AMD Ryzen 5 1600 || Cooler: CM Hyper212 Evo || MoBo: MSI B350 Mortar || RAM Vengeance LPX 2x8GB 3200MHz || PSU: Corsair CX600 || SSD: HyperX Fury 120GB & 240GB || HDD: WD Blue 1TB + 1TB 2.5'' backup drive || GPU: Sapphire Nitro+ RX 580 4GB

Laptop 1 HP x360 13-u113nl

Laptop Lenovo z50-75 with AMD FX-7500 || OS: Windows 10 / Ubuntu 17.04

DSLR Nikon D5300 w/ 18-105mm lens

Link to comment
Share on other sites

Link to post
Share on other sites

Sieve of Eratosthenes. O(n loglog n). There are many ways to improve the efficiency afterwards.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

when you get to the point where you say it's prime here:

58 minutes ago, pipnina said:

if(primeVector[vectorScan]>(primeSearch/2))

you shouldn't compare to primeSearch/2, instead compare to the square root of primeSearch.

 

Also

55 minutes ago, Cryosec said:

I'm thinking multithreading, but I can't help code-wise in this case

multi-threading wouldn't be an option in this case since you are dependent of the previous execution of the loop in order to execute the next one, if you use multi-threading you might skip a prime, or wrongly categorize a number as prime, specially early on.

The best way to measure the quality of a piece of code is "Oh F*** "s per line

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, espurritado said:

multi-threading wouldn't be an option in this case since you are dependent of the previous execution of the loop in order to execute the next one, if you use multi-threading you might skip a prime, or wrongly categorize a number as prime, specially early on.

I've learned something new today :D 

Computer Case: NZXT S340 || CPU: AMD Ryzen 5 1600 || Cooler: CM Hyper212 Evo || MoBo: MSI B350 Mortar || RAM Vengeance LPX 2x8GB 3200MHz || PSU: Corsair CX600 || SSD: HyperX Fury 120GB & 240GB || HDD: WD Blue 1TB + 1TB 2.5'' backup drive || GPU: Sapphire Nitro+ RX 580 4GB

Laptop 1 HP x360 13-u113nl

Laptop Lenovo z50-75 with AMD FX-7500 || OS: Windows 10 / Ubuntu 17.04

DSLR Nikon D5300 w/ 18-105mm lens

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, espurritado said:

when you get to the point where you say it's prime here:

you shouldn't compare to primeSearch/2, instead compare to the square root of primeSearch.

 

Also

multi-threading wouldn't be an option in this case since you are dependent of the previous execution of the loop in order to execute the next one, if you use multi-threading you might skip a prime, or wrongly categorize a number as prime, specially early on.

Thanks for that one! I was just trying to tweak that primeSearch/2 value.

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, gabrielcarvfer said:

@pipnina he asked for real primes? When calculating big primes (pseudoprimes), people usually use Fermat or Miller-Rabin/Solavay-Strassen to check for primality of big numbers.

 

You can use multithreading if checking for real primes, but not in a general way, as espurritado just mentioned, because there is data dependency. You could, for example, use a single thread to find early primes, and then use multiple thread to check for later primes, as the density of primes drops really fast as the number increases. The only complication there would be making a thread that is holding a higher value than other threads until all lower value threads finish, so that you can guarantee the results.

 

If using Fermat, or other primality test alike, you can have tons of threads calculating pseudoprimes. :D

I think he's just generating all prime numbers in a set range. If that is the case, I'd rather use the sieve of Eratosthenes. It's simple not only in concept but also in implementation. Read about it here : https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes .

#include <bits/stdc++.h>
static const int N=0x989680;
bool p[N+1];
int main() {

    std::fill(std::begin(p)+2,std::end(p),1);
    for(int i=2;i<=N;i++)
        if(p[i])
            for(int k=2;k*i<=N;++k)
                p[k*i]=0;
}

This generates all prime numbers smaller than 10'000'000 in around 120 milliseconds for me. For a range of up to 100'000, it does it in about 0.5 milliseconds. I'm working on an i7 4710HQ.

The algorithm has a complexity of O(nloglogn), but memory is O(n).

 

Anyway, you can further optimize it in pretty obvious manners, although they are mostly unnecessary for what you're trying to do, I think.

Firstly, any even number with the exception of 2 isn't prime, that's pretty obvious.

One technique often used in algorithmic problems is splitting the workload in segments of sqrt n. This reduces memory usage to O(sqrt n) and increases performance if we choose the sizes of our segments accordingly, that is if they can fit into the CPU's cache memory, and not RAM.

Another technique we could use here is called wheel factorization, which is basically used to skip multiples of small primes. You can, for example, use a modulo 210 wheel to skip multiples of 2,3,5 and 7. Read more about it here : https://en.wikipedia.org/wiki/Wheel_factorization

Parallelization is yet another thing that could improve the execution time.

 

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, pipnina said:

Thanks for that one! I was just trying to tweak that primeSearch/2 value.

It's been a while since programming for me, so excuse me if I mess up my thoughts a bit.

While it is true that sqrt(primeSearch) > p(i) (where p(i) is some prime) is a good way to detect that you don't need to do any more calculations, remember not to put sqrt(primSearch) into the inner for loop.

 

Consider the following

int z, i;
for(z=0; z < 10000; z++) {
    for(i=0; sqrt(z) > i; ++i) { //sqrt(z) gets evaluated every single time for i, despite it effectively being static
  		//Some code here
    }
}
  
//Now consider the following
int z, i, sqt;
for(z=0; z < 10000; z++) {
    sqt = (int) sqrt(z);
    for(i=0; sqt > i; ++i) { //Only evaluated once
  		//Some code here
    }
}

Actually, someone please correct me if I am wrong, but doing a nested for loop with for(int i....is that more, less, or equivalent efficiency to declaring the variables outside of the loop?

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

Well you don't say the exact way the problem is stated so depending on restrictions you could be very clever.

For example, if the user will only ever enter a number between 0 and 100k to test your app, you could just "cache" all the prime numbers in an array in your application... the list of these prime numbers if 58 KB in nice formatted html page: http://primos.mat.br/primeiros_10000_primos.txt  but you could just define a constant array of numbers and type each number in its hexadecimal notation ex 0x71EF for 29167

The lookup in the array is going to be faster than determining if the number is prime or not.

 

For higher numbers, you could look into OpenCL ... basically testing multiple numbers in parallel to see if they're prime or not, by getting all those hundreds of compute units to work.

Link to comment
Share on other sites

Link to post
Share on other sites

Using bitshifts it is a lot faster then your Implementation. it is the implementation of Sieve of Eratosthenes:

On 4.11.2016 at 8:48 PM, Nineshadow said:

I'd rather use the sieve of Eratosthenes. It's simple not only in concept but also in implementation. Read about it here : https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes .


#include <bits/stdc++.h>
static const int N=0x989680;
bool p[N+1];
int main() {

    std::fill(std::begin(p)+2,std::end(p),1);
    for(int i=2;i<=N;i++)
        if(p[i])
            for(int k=2;k*i<=N;++k)
                p[k*i]=0;
}

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 11/4/2016 at 0:17 PM, pipnina said:

I'm making a prime number generator, my lecturer has said that the fastest code gets £20 (GBP) of his own money (With the catch that he's also in the contest).

I think my code is pretty quick, but wondered if there were any further ways to optimize it.

 


    //Determining if a number is prime
    for(int primeSearch=3; primeSearch<COUNT_TO; primeSearch+=2){  //Because we skip even numbers, we know we don't need to test the prime candidates by dividing them by two or ten.
        bool primeFound = false;
        for(unsigned int vectorScan=0; vectorScan<=primeVector.size(); vectorScan++){                   //For every prime number in the vector:
            if(primeSearch%primeVector[vectorScan]==0 && primeSearch!=primeVector[vectorScan]){break;}  //Make sure that the candidate is not divisible by any of the previous primes.
            if(primeVector[vectorScan]>(primeSearch/2)){                                                //If the prime number isn't disproved by the time we've started dividing by-
                primeFound=true;                                                                        //-more than half of the candidate's value, we assume it's prime.
                break;
            }
        }
        if(primeFound == true){
            primeVector.push_back(primeSearch);                                                        //Add the prime number to the list
        }
    }

Currently it executes between 100 and 200 milliseconds when generating prime numbers below a value of 100'000 (In release mode, on a 6700k)

 

Thanks :)

    ~pip

This is essentially somewhat optimized trial division. Another optimization that you can make here, which I think that @espurritado mentioned was that you only need to check up to the square root of the maximum number in your range. This is because that, for every number n, there is a number  F1 < √n such that there is a corresponding multiple of F1, √n < F< n.

This is a really complicated way of saying: For every factor of a number below it's square root, there is exactly one and only one corresponding factor above that numbers square root. There are no factors on either side of a numbers square root that do not have a corresponding factor on the other side of the root. The factor higher than the square root may be calculated as follows: F= n / F1 Where Fis the smaller factor.

 

Beyond that, it's generally considered that the fastest way to generate primes is a Sieve of Atkin, but good luck with that. The next best way is a Sieve of Eratosthenes.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×