Need help with a problem

Nineshadow · February 22, 2017

You are given a multiset, initially empty, on which you can do 3 operations:

insert(x) : insert number x into the multiset
delete(i) : delete the i-th number inserted into the multiset
query(a,b) : count how many numbers x there are in the multiset such that x xor a <= b.

O(n) for the queries isn't fast enough, so I tried looking for something along the lines of O(logn).

I thought I could use a trie/prefix tree. If you think of the binary representation of a number as a string, then you can construct a trie out of it. Well, it will also be a binary tree since we can only branch out to '0's and '1's. Nevertheless, in each node I store how many numbers there are with a specific prefix. Since I know how many numbers there are with a specific prefix, then I can also calculate how many numbers xor another one there are with a specific prefix. Now it's just the matter of selecting the ones which are smaller than b, and this is where I'm having problems.

fizzlesticks · February 22, 2017

Do you have any test data?

Nineshadow · February 23, 2017

8 hours ago, fizzlesticks said:

Do you have any test data?

The input data is structured as follows:

First, a number n: number of queries.

Next n lines are of the form (q,a) or (q,a,b).

If q, is 0 the instruction corresponds to the insert(a) operation.

If q, is 1 the instruction corresponds to delete(a) operation.

If q is 2, the instruction corresponds to the query(a,b) operation.

That being said :

Output:
1
0

Oh, and I guess I should have some constraints. @Erik Sieghart

number of queries <= 200.000

1 <= any numbers in the input <= 2.000.000.000

And yes,they are integers.

DevBlox · February 23, 2017

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

mathijs727 · February 23, 2017

5 hours ago, DevBlox said:

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

I think a binary tree is the way to go.

Have a node for every bit in the integer.

The integer less than operation (asuming unsigned int) is nothing more than a bitscan from left to right through the two numbers you're comparing.

For the first pair of bits that are not the same: a=0&b=1->true, a=1&b=1->false (for a<b).

Less than or equal returns true when it reaches the end (so when all the bits are the same).

For example:

a = 010101

b = 010011

a <= b holds false, because if you iterate from through a and b from left to right, in the first unequal pair you encounter, the a bit is 1 and in b it is 0.

So when you would traverse the tree without the XOR, you would stop traversal of a subtree when at level i the node holds value 1, but the i'th bit of your query value (b in ?<=b) equals 0.

I think with the XOR you should do the same, but decide to stop based on the value of the node XOR'd with the i'th bit of the mask (a in (a xor ? <= b)).

Just made this up, it might be wrong.

BTW: this is for little endian, in big endian you should iterate from the other side.

mathijs727 · February 23, 2017

Oops double post, linus doesnt like my phone?

Nineshadow · February 23, 2017

25 minutes ago, mathijs727 said:

I think a binary tree is the way to go.

Have a node for every bit in the integer.

The integer less than operation (asuming unsigned int) is nothing more than a bitscan from left to right through the two numbers you're comparing.

For the first pair of bits that are not the same: a=0&b=1->true, a=1&b=1->false (for a<b).

Less than or equal returns true when it reaches the end (so when all the bits are the same).

For example:

a = 010101

b = 010011

a <= b holds false, because if you iterate from through a and b from left to right, in the first unequal pair you encounter, the a bit is 1 and in b it is 0.

So when you would traverse the tree without the XOR, you would stop traversal of a subtree when at level i the node holds value 1, but the i'th bit of your query value (b in ?<=b) equals 0.

I think with the XOR you should do the same, but decide to stop based on the value of the node XOR'd with the i'th bit of the mask (a in (a xor ? <= b)).

Just made this up, it might be wrong.

BTW: this is for little endian, in big endian you should iterate from the other side.

That's what I was thinking as well, but then I realized it's not that simple.

Suppose you have

a=1011101

b=1101101

a<=b even if it has the 3rd bit (from the left) equal to 1, while b has it equal to 0. That's because the previous bit in a, which is more significant, is smaller than the previous bit in b.

So I thought about it this way:

Suppose I traverse the tree until I get to the most significant bit of b. I only go down the nodes which indicate a value of 0, since otherwise a would be bigger than b. Once I reach the most significant bit of b, I have two choices: If the bit in a is 0, then the rest of the binary representation of a could be whatever, because a<=b at that point. I can stop searching and add to the solutions. But if the bit in a is 1, then I must continue searching. So I search to the next most significant bit in b, going down only through the 0 nodes again, then do the same thing.

Nineshadow · February 23, 2017

5 hours ago, DevBlox said:

If you're looking for log n, binary search is a good idea to do, and since you have prefixes for your numbers it may be a good idea to partition your data into multiple tables identified by the prefix, depending on the variety of prefixes do binary search on that, then perform binary search on a specific table, it will work if data across partitions is reasonably even, otherwise you may have a penalty on certain partitions that are larger than others. A solution to that is partitioning your data using hashes, but you can potentially get away with numbers anyway. That's how some databases to it for data balancing between servers and fast (<1ms) access, and it adapts well to large amounts of data. Implementing this in-memory should give you the performance you want, also, this could lend itself to some parallelism too, I don't know how well would it lend itself to just sequential reads.

Yeah, an x-fast trie pretty much.

After that add subdivisioning, balancing and amortization and you get y-fast tries.

Anyway, it's a possibility but it's too complicated for this problem. It's not supposed to be that hard.

mathijs727 · February 23, 2017

51 minutes ago, Nineshadow said:

That's what I was thinking as well, but then I realized it's not that simple.

Suppose you have

a=1011101

b=1101101

a<=b even if it has the 3rd bit (from the left) equal to 1, while b has it equal to 0. That's because the previous bit in a, which is more significant, is smaller than the previous bit in b.

So I thought about it this way:

Suppose I traverse the tree until I get to the most significant bit of b. I only go down the nodes which indicate a value of 0, since otherwise a would be bigger than b. Once I reach the most significant bit of b, I have two choices: If the bit in a is 0, then the rest of the binary representation of a could be whatever, because a<=b at that point. I can stop searching and add to the solutions. But if the bit in a is 1, then I must continue searching. So I search to the next most significant bit in b, going down only through the 0 nodes again, then do the same thing.

Maybe I described it incorrectly but the point is to traverse left to right (most to least significant), so my approach would stop at the second bit.

In terms of traversal, C++ code (havent tried to compile) would be (without the XOR):

void traverse(
	Node* node,
	std::uint32_t compValue,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
		return;
	}


	std::uint32_t bit = compValue & (1 << 31);
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		traverse(node->leftChild, compValue, false, results);
		traverse(node->rightChild, compValue, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

EDIT:

With XOR I think traversal would be this:

void traverse(
	Node* node,
	std::uint32_t compValue,
	std::uint32_t mask,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
		return;
	}


	std::uint32_t bit = (compValue & (1 << 31)) >> 31;// Get left most bit
	std::uint32_t maskBit = (mask & (1 << 31)) >> 31;// Get left most bit
	bit ^= maskBit;// Bitwise XOR
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		mask <<= 1;// Do the same for the mask
		traverse(node->leftChild, compValue, mask, false, results);
		traverse(node->rightChild, compValue, mask, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

Nineshadow · February 23, 2017

3 hours ago, mathijs727 said:

Maybe I described it incorrectly but the point is to traverse left to right (most to least significant), so my approach would stop at the second bit.

In terms of traversal, C++ code (havent tried to compile) would be (without the XOR):


void traverse(
	Node* node,
	std::uint32_t compValue,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
		return;
	}


	std::uint32_t bit = compValue & (1 << 31);
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		traverse(node->leftChild, compValue, false, results);
		traverse(node->rightChild, compValue, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, true, results);
		traverse(node->rightChild, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

EDIT:

With XOR I think traversal would be this:


void traverse(
	Node* node,
	std::uint32_t compValue,
	std::uint32_t mask,
	bool confirmedSmaller,
	std::vector<std::uint32_t>& results)
{
	if (confirmedSmaller)
	{
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
		return;
	}


	std::uint32_t bit = (compValue & (1 << 31)) >> 31;// Get left most bit
	std::uint32_t maskBit = (mask & (1 << 31)) >> 31;// Get left most bit
	bit ^= maskBit;// Bitwise XOR
	if ((bit >> 31) == node->value)
	{
		compValue <<= 1;// "Pop" the left most bit from the compare value
		mask <<= 1;// Do the same for the mask
		traverse(node->leftChild, compValue, mask, false, results);
		traverse(node->rightChild, compValue, mask, false, results);
	} else if (bit > 0) {
		// Node contains 0, value to which we compare contains 1
		// -> We're smaller than the value to which we compare
		traverse(node->leftChild, 0, 0, true, results);
		traverse(node->rightChild, 0, 0, true, results);
	}// else {
		// Node contains 1, value to which we compare contains 0
		// -> We're larger than the value to which we compare
	//}
}

Nevermind, I'm an idiot. I was simply not stopping from searching when the value I find is already smaller than what I'm looking for. Whops.

Here's what my source code looks like for those interested.

Spoiler


#include <bits/stdc++.h>
#define MAXM 200005
using namespace std;
ifstream in("test.in");
ofstream out("test.out");
int n,m,sol,a,b,c,v[MAXM];
bitset<31> nr,comp;
struct node{
    int t;
    node *zero,*one;
    node():t(0),zero(nullptr),one(nullptr){}
};
node *root=new node();
void insert(node *nod=root,int pos=30)
{
    if(pos==-1)
        return;
    if(nr[pos])
    {
        if(nod->one==nullptr)
            nod->one=new node();
        nod->one->t++;
        insert(nod->one,pos-1);
    }
    else
    {
        if(nod->zero==nullptr)
            nod->zero=new node();
        nod->zero->t++;
        insert(nod->zero,pos-1);
    }
}
void remove(node *nod=root,int pos=30)
{
    if(pos==-1)
        return;
    if(nr[pos])
    {
        remove(nod->one,pos-1);
        nod->one->t--;
        if(nod->one->t==0)
        {
            delete nod->one;
            nod->one=nullptr;
        }
    }
    else
    {
        remove(nod->zero,pos-1);
        nod->zero->t--;
        if(nod->zero->t==0)
        {
            delete nod->zero;
            nod->zero=nullptr;
        }
    }
}
void query(node *nod=root, int pos=30)
{
    if(nod->one==nullptr && nod->zero==nullptr)
    {
        sol+=nod->t;
        return;
    }
    if(nod->one!=nullptr)
    {
        if((1^nr[pos])<comp[pos])
            sol+=nod->one->t;
        if((1^nr[pos])==comp[pos])
            query(nod->one,pos-1);
    }
    if(nod->zero!=nullptr)
    {
        if((0^nr[pos])<comp[pos])
            sol+=nod->zero->t;
        if((0^nr[pos])==comp[pos])
            query(nod->zero,pos-1);
    }
}
int main()
{
    in>>m;
    while(m--)
    {
        in>>a>>b;
        if(a==0)
        {
            v[n++]=b;
            nr=bitset<31>(b);
            insert();
        }
        if(a==1)
        {
            nr=bitset<31>(v[b-1]);
            remove();
        }
        if(a==2)
        {
            in>>c;
            nr=bitset<31>(b),comp=bitset<31>(c);
            sol=0;
            query();
            out<<sol<<'\n';
        }
    }
    return 0;
}

Sign In

Need help with a problem

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

I Will NOT Give You $250 for Your Broken Game - WAN Show April 26, 2024

Latest From Tech Quickie:

Why Are Gaming Laptops So Expensive?

Latest From TechLinked:

Good Riddance, TikTok

Latest From GameLinked:

Is Nintendo being FRAMED?

Latest From ShortCircuit:

I tried 20 influencer foods, here are the best… and the worst…

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!