Jump to content

Need help with a problem

Nineshadow
Go to solution Solved by mathijs727,
51 minutes ago, Nineshadow said:

That's what I was thinking as well, but then I realized it's not that simple.

Suppose you have

a=1011101

b=1101101

a<=b even if it has the 3rd bit (from the left) equal to 1, while b has it equal to 0. That's because the previous bit in a, which is more significant, is smaller than the previous bit in b. 

 

So I thought about it this way:

Suppose I traverse the tree until I get to the most significant bit of b. I only go down the nodes which indicate a value of 0, since otherwise a would be bigger than b. Once I reach the most significant bit of b, I have two choices: If the bit in a is 0, then the rest of the binary representation of a could be whatever, because a<=b at that point. I can stop searching and add to the solutions. But if the bit in a is 1, then I must continue searching. So I search to the next most significant bit in b, going down only through the 0 nodes again, then do the same thing.

Maybe I described it incorrectly but the point is to traverse left to right (most to least significant), so my approach would stop at the second bit.

In terms of traversal, C++ code (havent tried to compile) would be (without the XOR):

void traverse(
	Node* node,
	std::uint32_t compValue,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
		return;
	}


	std::uint32_t bit = compValue & (1 << 31);
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		traverse(node->leftChild, compValue, false, results);
		traverse(node->rightChild, compValue, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

EDIT:

With XOR I think traversal would be this:

void traverse(
	Node* node,
	std::uint32_t compValue,
	std::uint32_t mask,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
		return;
	}


	std::uint32_t bit = (compValue & (1 << 31)) >> 31;// Get left most bit
	std::uint32_t maskBit = (mask & (1 << 31)) >> 31;// Get left most bit
	bit ^= maskBit;// Bitwise XOR
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		mask <<= 1;// Do the same for the mask
		traverse(node->leftChild, compValue, mask, false, results);
		traverse(node->rightChild, compValue, mask, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

You are given a multiset, initially empty, on which you can do 3 operations:

  • insert(x) : insert number x into the multiset
  • delete(i) : delete the i-th number inserted into the multiset
  • query(a,b) : count how many numbers x there are in the multiset such that x xor a <= b.

O(n) for the queries isn't fast enough, so I tried looking for something along the lines of O(logn).

I thought I could use a trie/prefix tree. If you think of the binary representation of a number as a string, then you can construct a trie out of it. Well, it will also be a binary tree since we can only branch out to '0's and '1's. Nevertheless, in each node I store how many numbers there are with a specific prefix. Since I know how many numbers there are with a specific prefix, then I can also calculate how many numbers xor another one there are with a specific prefix. Now it's just the matter of selecting the ones which are smaller than b, and this is where I'm having problems.

 

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, fizzlesticks said:

Do you have any test data?

The input data is structured as follows:

First, a number n: number of queries.

Next n lines are of the form (q,a) or (q,a,b).

If q, is 0 the instruction corresponds to the insert(a) operation.

If q, is 1 the instruction corresponds to delete(a) operation.

If q is 2, the instruction corresponds to the query(a,b) operation.

That being said :

Input:
5
0 1
0 2
2 3 1
1 2
2 3 1
Output:
1
0

Oh, and I guess I should have some constraints. @Erik Sieghart

number of queries <= 200.000

1 <= any numbers in the input <= 2.000.000.000

And yes,they are integers.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, DevBlox said:

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

I think a binary tree is the way to go.

Have a node for every bit in the integer.

 

The integer less than operation (asuming unsigned int) is nothing more than a bitscan from left to right through the two numbers you're comparing.

For the first pair of bits that are not the same: a=0&b=1->true, a=1&b=1->false (for a<b).

Less than or equal returns true when it reaches the end (so when all the bits are the same).

 

For example:

a = 010101

b = 010011

a <= b holds false, because if you iterate from through a and b from left to right, in the first unequal pair you encounter, the a bit is 1 and in b it is 0.

 

So when you would traverse the tree without the XOR, you would stop traversal of a subtree when at level i the node holds value 1, but the i'th bit of your query value (b in ?<=b) equals 0.

I think with the XOR you should do the same, but decide to stop based on the value of the node XOR'd with the i'th bit of the mask (a in (a xor ? <= b)).

 

Just made this up, it might be wrong.

BTW: this is for little endian, in big endian you should iterate from the other side.

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Oops double post, linus doesnt like my phone?

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, mathijs727 said:

I think a binary tree is the way to go.

Have a node for every bit in the integer.

 

The integer less than operation (asuming unsigned int) is nothing more than a bitscan from left to right through the two numbers you're comparing.

For the first pair of bits that are not the same: a=0&b=1->true, a=1&b=1->false (for a<b).

Less than or equal returns true when it reaches the end (so when all the bits are the same).

 

For example:

a = 010101

b = 010011

a <= b holds false, because if you iterate from through a and b from left to right, in the first unequal pair you encounter, the a bit is 1 and in b it is 0.

 

So when you would traverse the tree without the XOR, you would stop traversal of a subtree when at level i the node holds value 1, but the i'th bit of your query value (b in ?<=b) equals 0.

I think with the XOR you should do the same, but decide to stop based on the value of the node XOR'd with the i'th bit of the mask (a in (a xor ? <= b)).

 

Just made this up, it might be wrong.

BTW: this is for little endian, in big endian you should iterate from the other side.

That's what I was thinking as well, but then I realized it's not that simple.

Suppose you have

a=1011101

b=1101101

a<=b even if it has the 3rd bit (from the left) equal to 1, while b has it equal to 0. That's because the previous bit in a, which is more significant, is smaller than the previous bit in b. 

 

So I thought about it this way:

Suppose I traverse the tree until I get to the most significant bit of b. I only go down the nodes which indicate a value of 0, since otherwise a would be bigger than b. Once I reach the most significant bit of b, I have two choices: If the bit in a is 0, then the rest of the binary representation of a could be whatever, because a<=b at that point. I can stop searching and add to the solutions. But if the bit in a is 1, then I must continue searching. So I search to the next most significant bit in b, going down only through the 0 nodes again, then do the same thing.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, DevBlox said:

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

Yeah, an x-fast trie pretty much.

After that add subdivisioning, balancing and amortization and you get y-fast tries.

Anyway, it's a possibility but it's too complicated for this problem. It's not supposed to be that hard.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

51 minutes ago, Nineshadow said:

That's what I was thinking as well, but then I realized it's not that simple.

Suppose you have

a=1011101

b=1101101

a<=b even if it has the 3rd bit (from the left) equal to 1, while b has it equal to 0. That's because the previous bit in a, which is more significant, is smaller than the previous bit in b. 

 

So I thought about it this way:

Suppose I traverse the tree until I get to the most significant bit of b. I only go down the nodes which indicate a value of 0, since otherwise a would be bigger than b. Once I reach the most significant bit of b, I have two choices: If the bit in a is 0, then the rest of the binary representation of a could be whatever, because a<=b at that point. I can stop searching and add to the solutions. But if the bit in a is 1, then I must continue searching. So I search to the next most significant bit in b, going down only through the 0 nodes again, then do the same thing.

Maybe I described it incorrectly but the point is to traverse left to right (most to least significant), so my approach would stop at the second bit.

In terms of traversal, C++ code (havent tried to compile) would be (without the XOR):

void traverse(
	Node* node,
	std::uint32_t compValue,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
		return;
	}


	std::uint32_t bit = compValue & (1 << 31);
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		traverse(node->leftChild, compValue, false, results);
		traverse(node->rightChild, compValue, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

EDIT:

With XOR I think traversal would be this:

void traverse(
	Node* node,
	std::uint32_t compValue,
	std::uint32_t mask,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
		return;
	}


	std::uint32_t bit = (compValue & (1 << 31)) >> 31;// Get left most bit
	std::uint32_t maskBit = (mask & (1 << 31)) >> 31;// Get left most bit
	bit ^= maskBit;// Bitwise XOR
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		mask <<= 1;// Do the same for the mask
		traverse(node->leftChild, compValue, mask, false, results);
		traverse(node->rightChild, compValue, mask, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

Desktop: Intel i9-10850K (R9 3900X died 😢 )| MSI Z490 Tomahawk | RTX 2080 (borrowed from work) - MSI GTX 1080 | 64GB 3600MHz CL16 memory | Corsair H100i (NF-F12 fans) | Samsung 970 EVO 512GB | Intel 665p 2TB | Samsung 830 256GB| 3TB HDD | Corsair 450D | Corsair RM550x | MG279Q

Laptop: Surface Pro 7 (i5, 16GB RAM, 256GB SSD)

Console: PlayStation 4 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, mathijs727 said:

Maybe I described it incorrectly but the point is to traverse left to right (most to least significant), so my approach would stop at the second bit.

In terms of traversal, C++ code (havent tried to compile) would be (without the XOR):


void traverse(
	Node* node,
	std::uint32_t compValue,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
		return;
	}


	std::uint32_t bit = compValue & (1 << 31);
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		traverse(node->leftChild, compValue, false, results);
		traverse(node->rightChild, compValue, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

EDIT:

With XOR I think traversal would be this:


void traverse(
	Node* node,
	std::uint32_t compValue,
	std::uint32_t mask,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
		return;
	}


	std::uint32_t bit = (compValue & (1 << 31)) >> 31;// Get left most bit
	std::uint32_t maskBit = (mask & (1 << 31)) >> 31;// Get left most bit
	bit ^= maskBit;// Bitwise XOR
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		mask <<= 1;// Do the same for the mask
		traverse(node->leftChild, compValue, mask, false, results);
		traverse(node->rightChild, compValue, mask, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

 

Nevermind, I'm an idiot. I was simply not stopping from searching when the value I find is already smaller than what I'm looking for. Whops.

Here's what my source code looks like for those interested.

Spoiler

 


#include <bits/stdc++.h>
#define MAXM 200005
using namespace std;
ifstream in("test.in");
ofstream out("test.out");
int n,m,sol,a,b,c,v[MAXM];
bitset<31> nr,comp;
struct node{
    int t;
    node *zero,*one;
    node():t(0),zero(nullptr),one(nullptr){}
};
node *root=new node();
void insert(node *nod=root,int pos=30)
{
    if(pos==-1)
        return;
    if(nr[pos])
    {
        if(nod->one==nullptr)
            nod->one=new node();
        nod->one->t++;
        insert(nod->one,pos-1);
    }
    else
    {
        if(nod->zero==nullptr)
            nod->zero=new node();
        nod->zero->t++;
        insert(nod->zero,pos-1);
    }
}
void remove(node *nod=root,int pos=30)
{
    if(pos==-1)
        return;
    if(nr[pos])
    {
        remove(nod->one,pos-1);
        nod->one->t--;
        if(nod->one->t==0)
        {
            delete nod->one;
            nod->one=nullptr;
        }
    }
    else
    {
        remove(nod->zero,pos-1);
        nod->zero->t--;
        if(nod->zero->t==0)
        {
            delete nod->zero;
            nod->zero=nullptr;
        }
    }
}
void query(node *nod=root, int pos=30)
{
    if(nod->one==nullptr && nod->zero==nullptr)
    {
        sol+=nod->t;
        return;
    }
    if(nod->one!=nullptr)
    {
        if((1^nr[pos])<comp[pos])
            sol+=nod->one->t;
        if((1^nr[pos])==comp[pos])
            query(nod->one,pos-1);
    }
    if(nod->zero!=nullptr)
    {
        if((0^nr[pos])<comp[pos])
            sol+=nod->zero->t;
        if((0^nr[pos])==comp[pos])
            query(nod->zero,pos-1);
    }
}
int main()
{
    in>>m;
    while(m--)
    {
        in>>a>>b;
        if(a==0)
        {
            v[n++]=b;
            nr=bitset<31>(b);
            insert();
        }
        if(a==1)
        {
            nr=bitset<31>(v[b-1]);
            remove();
        }
        if(a==2)
        {
            in>>c;
            nr=bitset<31>(b),comp=bitset<31>(c);
            sol=0;
            query();
            out<<sol<<'\n';
        }
    }
    return 0;
}

 

 

 

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×