Jump to content

Need help with optimizing an algorithm

Go to solution Solved by Ciccioo,

As far as number 2 goes, there's 2 problems :

A track is not defragmented when the first N clusters are busy. It's defragmented when all busy clusters are next to each other (which could be anywhere).

The other thing is that the disk is circular, so if there's 2 clusters, one at the begining of the track, and the other at the end, they are actually next to each other.

right, i did not think about that. how about grandpa gets an ssd and we don't have to deal with this 2010 crap anymore

 

you observed the right thing, only the ends of the sequence change. another good news is that you don't give two craps about the order in which clusters are in the current subsequence, you only need to know how many free clusters there are, so you can proceed like this:

for each track:    count the number N of busy clusters in the track    count the number M of free clusters present in the first N clusters starting from sector 0 (the same way as said before)    call this value of M "bestM" and keep track of which sector this subsequence starts at    now we gotta slide        'pop' the first cluster:            if the first cluster is free, decrement M by one. do nothing otherwise        'push' in the next cluster:            if the next cluster is free, increment M by one. do nothing otherwise        if this value of M is better than bestM (less moves), then update bestM and keep track of this subsequence starting sector        keep going until you run the full circle

the modulo method is nice for working on circular arrays, only notice that if the track is N sectors long and it has M busy clusters you don't need to iterate up to 2*N but you can stop at N+M

Defrag 
 
The hard disk (hard drive) is a device used for data storage. Storage is done on a magnetic surface on the platters arranged round metallic. On a platter, data is organized in tracks and sectors and the area located at the intersection of a track and a sector called cluster. 
A cluster can have two states: open, if there is no data, or busy, when it contains data. 
A turntable is called defragmented if all busy clusters on each track are placed consecutively. Defragmentation is accomplished by moving some busy clusters and aims to reduce data access time. Moving a cluster is transferring data from a cluster occupied by a free cluster on the same track.
 
qc1YG5g.png
 
Requirement
Knowing the number of tracks and sectors S P of a turntable, number and position of clusters busy to write a program that determines: 
1. the number of tracks which all free clusters; 
2. The minimum number of moves cluster for each track individually, so the dish to be defragmented.
 
Input data
The first line of input file defrag.in V is the natural number whose value can only be 1 or 2. 
On the second line of the input file are two integers P and S, separated by a space, meaning the statement. 
The third line contains a natural number representing the total number of clusters C occupied by the glass, and on each of the following C lines is a pair of values ​​p i and i, 1 ≤ i ≤ C, separated by a space, representing the track or sector where each cluster is busy.
 
Output data
The output file is defrag.out. 
If the value of V is 1 then the output file will contain the first line a natural number representing the number of tracks that were all free clusters. 
If the value of V is 2 where the output file will contain the first line of natural numbers denoted P $ M $ i, 1 ≤ i ≤ $ P, separated by a single space, where $ M and the minimum number of moves cluster, of those on the track and so busy roadway and clusters to find in a consecutive order.
 
Limitation
1 ≤ P ≤ 100
1 ≤ S ≤ 360
1 ≤ C ≤ P * S
  • the tracks are numbered from 1 to P from the outer track;
  • sectors are numbered from 1 to S in the clockwise direction from one sector;
  • if a track is all free clusters, then the required second requirement is 0;
  • 20% of tests will have the value V = 1, and 80% of tests will have the value V = 2.

Examples :

1.

4 8
10
1 1
1 3
1 5
1 7
4 5
4 1
4 6
4 8
2 2
2 4

[spoiler =defrag.out]

1

2.

spoiler='defrag.in']

4 8
10
1 1
1 3
1 5
1 7
4 5
4 1
4 6
4 8
2 2
2 4

[spoiler =defrag.out]

2 1 0 1

 

 

Here's my code :

Editing this in a sec.

#include <iostream>#include <fstream>using namespace std;ifstream in("defrag.in");ofstream out("defrag.out");int v, p, c, s;bool m[100][360];int nr_clustere_p[100];void citeste() {	in >> v >> p >> s >> c;	for (int i = 0; i < c; i++)	{		int aux1, aux2;		in >> aux1 >> aux2;		m[aux1-1][aux2-1] = 1;	}}void calc_clustere_pista() {	for (int i = 0; i < p; i++)	{		int nr_clustere = 0;		for (int j = 0; j < s; j++)			if (m[i][j] == 1) nr_clustere++;		nr_clustere_p[i] = nr_clustere;	}}int main() {	citeste();	if (v == 1) {		int R=0;		for (int i = 0; i < p; i++) {			bool este_liber = true;			for (int j = 0; j < s; j++)				if (m[i][j] == 1) {					este_liber = false; break;				}			if (este_liber) R++;		}		out << R;	}	if (v == 2) {		calc_clustere_pista();		int r, r_min;		for (int i = 0; i < p; i++)		{			if (nr_clustere_p[i]) {				r_min = 999999999;				for (int j = 0; j <= s * 2 - nr_clustere_p[i]; j++)				{					r = 0;					for (int k = j; k < j + nr_clustere_p[i];k++)					{						cout << k << ' ';						if (m[i][k%s] == 0) r++;					}						cout << endl;					if (r < r_min) r_min = r;				}				out << r_min << ' ';			}			else { out << 0 << ' '; }		}	}	return 0;}

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
https://linustechtips.com/topic/412255-need-help-with-optimizing-an-algorithm/
Share on other sites

Link to post
Share on other sites

either my english is horrible, or this is google translated. possibly a mix of the two.

 

for number 1:

it's alright already. you could save a bunch of code by just calling calc_clustere_pista() and then using std::count() to count how many zeroes there are in nr_clustere_p;

 

number 2:

you can make it much simpler, let's talk about a single track:

you have a track, you know how many busy clusters it has (let's say N), and you know which ones are busy/free.

you know that when the track is defragmented the first N clusters will be busy. every free cluster needs to be written over by the data of a busy cluster.

so that's your answer. the number of moves required is the number of free clusters in the first N clusters of the track. again, this is a one-liner using std::count

Link to post
Share on other sites

either my english is horrible, or this is google translated. possibly a mix of the two.

 

for number 1:

it's alright already. you could save a bunch of code by just calling calc_clustere_pista() and then using std::count() to count how many zeroes there are in nr_clustere_p;

 

number 2:

you can make it much simpler, let's talk about a single track:

you have a track, you know how many busy clusters it has (let's say N), and you know which ones are busy/free.

you know that when the track is defragmented the first N clusters will be busy. every free cluster needs to be written over by the data of a busy cluster.

so that's your answer. the number of moves required is the number of free clusters in the first N clusters of the track. again, this is a one-liner using std::count

Yes, it is google translated. I wrote the post quickly as I had to go somewhere. I've now come back.Sorry for that.

Jeez, even the source code is using variables in my native lanugage.

 

Anyway, thanks for the std::find(). Haven't used it yet, but I will from now on.

 

As far as number 2 goes, there's 2 problems :

A track is not defragmented when the first N clusters are busy. It's defragmented when all busy clusters are next to each other (which could be anywhere).

The other thing is that the disk is circular, so if there's 2 clusters, one at the begining of the track, and the other at the end, they are actually next to each other.

qc1YG5g.png

This was rather easy to overcome, I just had to make/use a circular array.I could have doubled it, but it would have wasted memory, instead , when accessing an element from it (in the index range of 0 and 2*<size>), I access the element at <index> modulo <size>. It has the same effect as doubling the array would , but it's more memory efficient.

 

So, now let me expain what I did at number 2, which is very similar to what you were saying :

Let's say we defragmented the hard drive. In a specific track, we would get a continuous subsequence with the size of the total number of clusters present in that track. This subsequence can start anywhere, but only some of them imply a minimum number of moves. So, I find the number of free clusters for each of this type of subsequence(of size equal to the number of clusters in the track), then the number of moves is the minimum of those numbers.

 

What I thought about optimizing it so far :

I could optimize those subsequences, since when it is shifted right (or left), the only things that really change are begining and the end.

Example, let's say I have the array :

0 1 1 0 0 1 1 1 1 0 1 0 1

My first subsequence would be :

0 1 1 0 0 1 1 1

Then 1 1 0 0 1 1 1 1.

Then 1 0 0 1 1 1 1 0.

 

As I've said, only the edges (is that even properly used?) really change. The rest gets shifted.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to post
Share on other sites

As far as number 2 goes, there's 2 problems :

A track is not defragmented when the first N clusters are busy. It's defragmented when all busy clusters are next to each other (which could be anywhere).

The other thing is that the disk is circular, so if there's 2 clusters, one at the begining of the track, and the other at the end, they are actually next to each other.

right, i did not think about that. how about grandpa gets an ssd and we don't have to deal with this 2010 crap anymore

 

you observed the right thing, only the ends of the sequence change. another good news is that you don't give two craps about the order in which clusters are in the current subsequence, you only need to know how many free clusters there are, so you can proceed like this:

for each track:    count the number N of busy clusters in the track    count the number M of free clusters present in the first N clusters starting from sector 0 (the same way as said before)    call this value of M "bestM" and keep track of which sector this subsequence starts at    now we gotta slide        'pop' the first cluster:            if the first cluster is free, decrement M by one. do nothing otherwise        'push' in the next cluster:            if the next cluster is free, increment M by one. do nothing otherwise        if this value of M is better than bestM (less moves), then update bestM and keep track of this subsequence starting sector        keep going until you run the full circle

the modulo method is nice for working on circular arrays, only notice that if the track is N sectors long and it has M busy clusters you don't need to iterate up to 2*N but you can stop at N+M

Link to post
Share on other sites

right, i did not think about that. how about grandpa gets an ssd and we don't have to deal with this 2010 crap anymore

 

you observed the right thing, only the ends of the sequence change. another good news is that you don't give two craps about the order in which clusters are in the current subsequence, you only need to know how many free clusters there are, so you can proceed like this:

for each track:    count the number N of busy clusters in the track    count the number M of free clusters present in the first N clusters starting from sector 0 (the same way as said before)    call this value of M "bestM" and keep track of which sector this subsequence starts at    now we gotta slide        'pop' the first cluster:            if the first cluster is free, decrement M by one. do nothing otherwise        'push' in the next cluster:            if the next cluster is free, increment M by one. do nothing otherwise        if this value of M is better than bestM (less moves), then update bestM and keep track of this subsequence starting sector        keep going until you run the full circle

the modulo method is nice for working on circular arrays, only notice that if the track is N sectors long and it has M busy clusters you don't need to iterate up to 2*N but you can stop at N+M

About the modulo thing, you're right. I knew I should have stopped before 2*N, but I didn't exactly know when, so I decided to stick with it.

 

Anyway, that seems like it. I was trying to figure out when to increment or decrement , but you got that figured out before me.

 

Thanks.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to post
Share on other sites

About the modulo thing, you're right. I knew I should have stopped before 2*N, but I didn't exactly know when, so I decided to stick with it.

if you imagine the disk and the subsequence that slides around it, it becomes visually easy to spot when you already iterated over all the interesting places

actually N+M is where the head of the sliding subseq needs to stop, but the first cell will start from 0 and go up to N-1, so the loop signature will look just normal

 

happy to be of help ^_^

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×