Algo problem in C

g0ated · June 15, 2025

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LEN 15
#define TABLE_SIZE 131072

#define hashsize(n) ((unsigned long)1 << (n))
#define hashmask(n) (hashsize(n) - 1)

unsigned long oaat(const char *key, unsigned long len, unsigned long bits) {
    unsigned long hash, i;
    for (hash = 0, i = 0; i < len; i++) {
    hash += key[i];
    hash += (hash << 10);
    hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash & hashmask(bits);
}

typedef struct entry {
    char *substr;
    struct entry* next;
    char* ogPassword;
} entry;

static entry* substrings[TABLE_SIZE] = {NULL};

char* slice(char* substringBuffer, const char* str, int start, int end) {
    int j = 0;
    for (int i = start; i <= end; i++) {
        substringBuffer[j] = str[i];
        j++;
    }
    substringBuffer[j] = '\0';
    return substringBuffer;
}

void add(const char *password) {
    int key;
    int length = strlen(password);
    
    for (int start = 0; start < length; start++) {
        for (int end = start; end < length; end++) {
            char *buffer = (char *) malloc(LEN * sizeof(char));
            slice(buffer, password, start, end);
            key = oaat(buffer, strlen(buffer), 17);
            entry* temp = (entry *) malloc(sizeof(entry));
            temp->ogPassword = malloc(strlen(password) + 1); 
            strcpy(temp->ogPassword, password);
            temp->substr = buffer; 
            temp->next = substrings[key]; 
            substrings[key] = temp;
        }
    }
}

int query(const char *password, const char* og) {
    int key = oaat(password, strlen(password), 17);
    int counter = 0;
    entry* cur = substrings[key];
    while(cur != NULL) {
        if ((strcmp(cur->substr, password) == 0) && strcmp(cur->ogPassword, og) != 0) { counter++; }
        cur = cur->next;
    }
    return counter;
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[12];
    for (int i = 0; i < operations; i++) {
        scanf("%d", &opcode);
        scanf("%s", password);

        if (opcode == 1) {
            add(password);
        }
        else if (opcode == 2) {
            printf("%d\n", query(password, password));
        }
    }
    return 0;
}

This is the link to the problem: https://dmoj.ca/problem/coci17c1p3hard

It's not working, and I can't figure out why. Thanks in advance

Edited June 15, 2025 by goatedpenguin
Added syntax highlighting

Haswellx86 · June 16, 2025

My DSA skills are rusty and also undeveloped (still need to start my leetcode journey) and its 5 in the morning and I need to sleep (my schedule is cooked fr).

What is stopping you from fixing it with AI? My code ran.

I am not even going to dare to understand it. Managing a hash map in C is ugly. Just learn C++ by now or use Python or JavaScript for DSA. You should provide some explanation so it becomes easier to understand.

While trying to implement it myself, I first thought about a simple char db[Q][11] which will be our database and as I was thinking about implementing a hash map I thought instead of that, I could use a 2D array dictionary where the outer array consists of a-z and the sub-arrays contain the position of each character in the database, optimizing by making use of a single number by using the formula ((Nth query - 1) * 11) + offset where Nth query - 1 are all the queries that we skipped, 11 is the length of each query (including null terminator) and the offset is the offset in the current query. To find the sub string, we could jump to all occurrences of the first letter of our sub string to find by using our dictionary.

The database was allocated on the stack and I thought of allocating the dictionary on the stack as well but I realized the worst case of 100,000 queries * 10 max length means 1 million possible characters, and that too if we want to put them under 26 arrays that would be 26 million and if they contain 32 bit integers, that would be 104 MB + the worst case of 1 MB of the database which of course isn't going to fit on the stack. But still the worst case amount of characters are 1 million and not 26 million so perhaps if we could fit the a-z dictionary in the 1 million space only but since dictionaries work on a pre-defined offset for search, that isn't possible to adjust.

You could use DMA to start with much lesser then.

But then I realized a hash map is indeed a much better solution. I thought earlier because I had the same idea of search but instead of hashing 'a', we literally just have its ASCII value so why hash it, but hashing all the sub strings of the string didn't come to my mind. I thought that's like difficult and also would be slow, but the thing is, each query has a max of 10 characters so the max substrings in a query will be 54 sub strings, which though will take more time than storing the positions of 10 characters in my earlier implementation, when it comes to searching for that sub string, with the hash map it is instant but with the dictionary, you would still have to go all around the database and check each occurrence of the first letter of the sub string and see if the sub string completes. My earlier implementation could be faster in some scenarios, especially lighter ones but when it comes to having a large number of queries, hash map could be better. Also with the hash map you only need a Hash Table whereas with mine you needed a database and a dictionary but that is simpler.

I don't know what you are doing with bit shifts. Use a simple hash function like this -

#define TABLE_SIZE 100000
unsigned int hash(char* str){
    unsigned int value = 0;
    while(*str){
        value = (value * 31) + *str++;
    }
    return value % TABLE_SIZE;
}

Remember to account for hash collisions and do linear probing.

I would start with getting the string to store, write an algorithm to extract all the sub strings and hash them, and at that location of the hash, increment the count but only if the sub string is unique for that query because you could get multiple same sub strings from a single query. After that when asked to search, hash to the sub string you want to output the count. Try to allocate the Hash Map on the stack as well, it helps in performance.

g0ated · June 16, 2025

1 hour ago, Haswellx86 said:

What is stopping you from fixing it with AI? My code ran.

Whats stopping from anyone using ai? Because it’s bad, I tried and it doesn’t work.

1 hour ago, Haswellx86 said:

I am not even going to dare to understand it. Managing a hash map in C is ugly. Just learn C++ by now or use Python or JavaScript for DSA. You should provide some explanation so it becomes easier to understand.

It’s not if you know it well. C is the best language for learning DS and algos because you are not given any libraries to do any work for you making it simple and easy to stay on point. Besides my goal isn’t to use c++ and using py or js is the worst for learning DSA. This a pretty short problem so providing an explanation isn’t necessary to me at least.

1 hour ago, Haswellx86 said:

While trying to implement it myself, I first thought about a simple char db[Q][11] which will be our database and as I was thinking about implementing a hash map I thought instead of that, I could use a 2D array dictionary where the outer array consists of a-z and the sub-arrays contain the position of each character in the database, optimizing by making use of a single number by using the formula ((Nth query - 1) * 11) + offset where Nth query - 1 are all the queries that we skipped, 11 is the length of each query (including null terminator) and the offset is the offset in the current query. To find the sub string, we could jump to all occurrences of the first letter of our sub string to find by using our dictionary.

This doesn’t make sense at all.

1 hour ago, Haswellx86 said:

The database was allocated on the stack and I thought of allocating the dictionary on the stack as well but I realized the worst case of 100,000 queries * 10 max length means 1 million possible characters, and that too if we want to put them under 26 arrays that would be 26 million and if they contain 32 bit integers, that would be 104 MB + the worst case of 1 MB of the database which of course isn't going to fit on the stack. But still the worst case amount of characters are 1 million and not 26 million so perhaps if we could fit the a-z dictionary in the 1 million space only but since dictionaries work on a pre-defined offset for search, that isn't possible to adjust.

You could use DMA to start with much lesser then.

A-Z dict wdym??? The hash table doesn’t have to be that big that’s the whole point of the hashing function and yes what you said is already implied.

1 hour ago, Haswellx86 said:

But then I realized a hash map is indeed a much better solution. I thought earlier because I had the same idea of search but instead of hashing 'a', we literally just have its ASCII value so why hash it, but hashing all the sub strings of the string didn't come to my mind. I thought that's like difficult and also would be slow, but the thing is, each query has a max of 10 characters so the max substrings in a query will be 54 sub strings, which though will take more time than storing the positions of 10 characters in my earlier implementation, when it comes to searching for that sub string, with the hash map it is instant but with the dictionary, you would still have to go all around the database and check each occurrence of the first letter of the sub string and see if the sub string completes. My earlier implementation could be faster in some scenarios, especially lighter ones but when it comes to having a large number of queries, hash map could be better. Also with the hash map you only need a Hash Table whereas with mine you needed a database and a dictionary but that is simpler.

I don’t think you understand what a hashmap and a dictionary are, there are the same thing. A dictionary is a Python term for a hashmap, the hashmap is already implemented. your worrying too much about memory the max size limit is 128MB which is more than enough. The database is the hash table.

1 hour ago, Haswellx86 said:
I don't know what you are doing with bit shifts. Use a simple hash function like this -
#define TABLE_SIZE 100000
unsigned int hash(char* str){
    unsigned int value = 0;
    while(*str){
        value = (value * 31) + *str++;
    }
    return value % TABLE_SIZE;
}
Remember to account for hash collisions and do linear probing.

I am using jenson’s hash algorithm which is good since it minimizes collisions and hence the bit shifts. Your hash function is horrible and incredibly slow.

This whole post is misleading especially with you confusing what a hashmap and dict are, plus you just gave a lecture on some random crap instead of looking at the real problem which is when I am querying for a password p and searching it in the hash table to count how many substrings match it excluding the substrings it made when it was added, previously. This would require some sort of id in each entry, in my case, I used the og password to exclude any substrings that matched with my password inputted at first. However, I can’t figure out why it doesn’t work though it looks like it should.

wasab · June 16, 2025

Summary of problem:

Adding a password → store all possible substrings of the password.
Querying a string → count how many other passwords (not equal to the query string itself) have this string as a substring.

Problematic behavior in your implementation:

Bug location is in how substrings are hashed and stored.

Your current implementation:

At each add(password) call:
- Creates all possible substrings.
- Stores an entry for each substring in a hash table substrings with the substring as substr, and the original password as ogPassword.

In query(password, password):

Hash the query password and check the linked list at that hash bucket.
For each matching substr equal to query string and different ogPassword, increment counter.

The issue:
You hash each substring individually by value. But — different substrings from different passwords could hash to the same bucket, and your lookup only scans the linked list at a single hash bucket for a given query string's hash value.

That means:

You’ll only see entries whose hashed value exactly matches the hash of the full query string.
But many substrings of other passwords containing the query string could hash to other hash buckets.

→ The hashing strategy here breaks correctness because you can't rely on the hash bucket of the query string to contain all substrings equal to it.
Two identical strings will always hash to the same value, yes — but you would have had to insert that exact string as a substring somewhere to be in that bucket.

In short:
The current design can only ever find substrings equal to the query string if it was itself inserted as a substring during a password’s add operation.
This works, but it's highly inefficient: for querying, you'd need to find all passwords whose substrings contain the query string — and with only a hash lookup by exact string length, you miss all other potential locations.

Correct, scalable design should either:

Use a trie (prefix tree) to store all substrings and allow efficient substring matching during queries.
Brute-force: Store a list of all passwords, and on query, for each password (except the query one itself), check strstr() if query string is a substring.
- It’s acceptable here since passwords are short (≤10 characters) and per problem constraints.

Minimal working fix:

If sticking to your current hash-table based approach:

During query(password, password) — instead of hashing the query and looking only at that bucket, you'd need to:
- Iterate over all hash buckets.
- Check each entry in all linked lists.
- For entries where substr == password and ogPassword != password, increment the counter.

Why? Because identical substrings could hash to different buckets unless the hash function is perfectly consistent (but that's unnecessary overhead for this).

Option 2: Trie-Based Substring Indexing

Idea:

Each node represents a character.
Paths down the trie form substrings.
Each node tracks which passwords inserted this substring (as a linked list or set).
On query: follow the trie path for the query string. If path exists — count number of unique passwords associated with this substring (excluding the query password itself).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PASSWORD_LEN 12
#define ALPHABET_SIZE 26
#define MAX_USERS 131072

typedef struct PasswordNode {
    char* password;
    struct PasswordNode* next;
} PasswordNode;

typedef struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    PasswordNode* passwords;
} TrieNode;

TrieNode* createTrieNode() {
    TrieNode* node = (TrieNode*)malloc(sizeof(TrieNode));
    for (int i = 0; i < ALPHABET_SIZE; i++)
        node->children[i] = NULL;
    node->passwords = NULL;
    return node;
}

TrieNode* root;
char* passwords[MAX_USERS];
int passwordCount = 0;

void addPasswordToNode(TrieNode* node, const char* password) {
    PasswordNode* p = node->passwords;
    while (p != NULL) {
        if (strcmp(p->password, password) == 0) return;  // already recorded
        p = p->next;
    }

    PasswordNode* newPasswordNode = (PasswordNode*)malloc(sizeof(PasswordNode));
    newPasswordNode->password = (char*)malloc(strlen(password) + 1);
    strcpy(newPasswordNode->password, password);
    newPasswordNode->next = node->passwords;
    node->passwords = newPasswordNode;
}

void add(const char* password) {
    passwords[passwordCount] = malloc(strlen(password) + 1);
    strcpy(passwords[passwordCount], password);
    passwordCount++;

    int len = strlen(password);
    for (int start = 0; start < len; start++) {
        TrieNode* curr = root;
        for (int i = start; i < len; i++) {
            int idx = password[i] - 'a';
            if (!curr->children[idx])
                curr->children[idx] = createTrieNode();
            curr = curr->children[idx];
            addPasswordToNode(curr, password);
        }
    }
}

int query(const char* queryString) {
    TrieNode* curr = root;
    int len = strlen(queryString);
    for (int i = 0; i < len; i++) {
        int idx = queryString[i] - 'a';
        if (!curr->children[idx])
            return 0;
        curr = curr->children[idx];
    }

    int counter = 0;
    PasswordNode* p = curr->passwords;
    while (p != NULL) {
        if (strcmp(p->password, queryString) != 0)
            counter++;
        p = p->next;
    }
    return counter;
}

void freeTrie(TrieNode* node) {
    if (!node) return;
    for (int i = 0; i < ALPHABET_SIZE; i++)
        freeTrie(node->children[i]);

    PasswordNode* p = node->passwords;
    while (p) {
        PasswordNode* temp = p;
        p = p->next;
        free(temp->password);
        free(temp);
    }
    free(node);
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[MAX_PASSWORD_LEN];

    root = createTrieNode();

    for (int i = 0; i < operations; i++) {
        scanf("%d %s", &opcode, password);
        if (opcode == 1) {
            add(password);
        } else if (opcode == 2) {
            printf("%d\n", query(password));
        }
    }

    // clean up
    for (int i = 0; i < passwordCount; i++)
        free(passwords[i]);
    freeTrie(root);

    return 0;
}

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time.

g0ated · June 16, 2025

2 hours ago, wasab said:
Summary of problem:

Adding a password → store all possible substrings of the password.

Querying a string → count how many other passwords (not equal to the query string itself) have this string as a substring.

Problematic behavior in your implementation:

Bug location is in how substrings are hashed and stored.

Your current implementation:

At each add(password) call:

Creates all possible substrings.

Stores an entry for each substring in a hash table substrings with the substring as substr, and the original password as ogPassword.

In query(password, password):

Hash the query password and check the linked list at that hash bucket.

For each matching substr equal to query string and different ogPassword, increment counter.

The issue:
You hash each substring individually by value. But — different substrings from different passwords could hash to the same bucket, and your lookup only scans the linked list at a single hash bucket for a given query string's hash value.

That means:

You’ll only see entries whose hashed value exactly matches the hash of the full query string.

But many substrings of other passwords containing the query string could hash to other hash buckets.

→ The hashing strategy here breaks correctness because you can't rely on the hash bucket of the query string to contain all substrings equal to it.
Two identical strings will always hash to the same value, yes — but you would have had to insert that exact string as a substring somewhere to be in that bucket.

In short:
The current design can only ever find substrings equal to the query string if it was itself inserted as a substring during a password’s add operation.
This works, but it's highly inefficient: for querying, you'd need to find all passwords whose substrings contain the query string — and with only a hash lookup by exact string length, you miss all other potential locations.

Correct, scalable design should either:

Use a trie (prefix tree) to store all substrings and allow efficient substring matching during queries.

Brute-force: Store a list of all passwords, and on query, for each password (except the query one itself), check strstr() if query string is a substring.

It’s acceptable here since passwords are short (≤10 characters) and per problem constraints.

Minimal working fix:

If sticking to your current hash-table based approach:

During query(password, password) — instead of hashing the query and looking only at that bucket, you'd need to:

Iterate over all hash buckets.

Check each entry in all linked lists.

For entries where substr == password and ogPassword != password, increment the counter.

Why? Because identical substrings could hash to different buckets unless the hash function is perfectly consistent (but that's unnecessary overhead for this).

Option 2: Trie-Based Substring Indexing

Idea:

Each node represents a character.

Paths down the trie form substrings.

Each node tracks which passwords inserted this substring (as a linked list or set).

On query: follow the trie path for the query string. If path exists — count number of unique passwords associated with this substring (excluding the query password itself).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PASSWORD_LEN 12
#define ALPHABET_SIZE 26
#define MAX_USERS 131072

typedef struct PasswordNode {
    char* password;
    struct PasswordNode* next;
} PasswordNode;

typedef struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    PasswordNode* passwords;
} TrieNode;

TrieNode* createTrieNode() {
    TrieNode* node = (TrieNode*)malloc(sizeof(TrieNode));
    for (int i = 0; i < ALPHABET_SIZE; i++)
        node->children[i] = NULL;
    node->passwords = NULL;
    return node;
}

TrieNode* root;
char* passwords[MAX_USERS];
int passwordCount = 0;

void addPasswordToNode(TrieNode* node, const char* password) {
    PasswordNode* p = node->passwords;
    while (p != NULL) {
        if (strcmp(p->password, password) == 0) return;  // already recorded
        p = p->next;
    }

    PasswordNode* newPasswordNode = (PasswordNode*)malloc(sizeof(PasswordNode));
    newPasswordNode->password = (char*)malloc(strlen(password) + 1);
    strcpy(newPasswordNode->password, password);
    newPasswordNode->next = node->passwords;
    node->passwords = newPasswordNode;
}

void add(const char* password) {
    passwords[passwordCount] = malloc(strlen(password) + 1);
    strcpy(passwords[passwordCount], password);
    passwordCount++;

    int len = strlen(password);
    for (int start = 0; start < len; start++) {
        TrieNode* curr = root;
        for (int i = start; i < len; i++) {
            int idx = password[i] - 'a';
            if (!curr->children[idx])
                curr->children[idx] = createTrieNode();
            curr = curr->children[idx];
            addPasswordToNode(curr, password);
        }
    }
}

int query(const char* queryString) {
    TrieNode* curr = root;
    int len = strlen(queryString);
    for (int i = 0; i < len; i++) {
        int idx = queryString[i] - 'a';
        if (!curr->children[idx])
            return 0;
        curr = curr->children[idx];
    }

    int counter = 0;
    PasswordNode* p = curr->passwords;
    while (p != NULL) {
        if (strcmp(p->password, queryString) != 0)
            counter++;
        p = p->next;
    }
    return counter;
}

void freeTrie(TrieNode* node) {
    if (!node) return;
    for (int i = 0; i < ALPHABET_SIZE; i++)
        freeTrie(node->children[i]);

    PasswordNode* p = node->passwords;
    while (p) {
        PasswordNode* temp = p;
        p = p->next;
        free(temp->password);
        free(temp);
    }
    free(node);
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[MAX_PASSWORD_LEN];

    root = createTrieNode();

    for (int i = 0; i < operations; i++) {
        scanf("%d %s", &opcode, password);
        if (opcode == 1) {
            add(password);
        } else if (opcode == 2) {
            printf("%d\n", query(password));
        }
    }

    // clean up
    for (int i = 0; i < passwordCount; i++)
        free(passwords[i]);
    freeTrie(root);

    return 0;
}
Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time.

Thanks for the in depth explanation.

wanderingfool2 · July 11, 2025

On 6/15/2025 at 7:38 PM, wasab said:

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time.

While AI is great using AI as a crutch to complete work isn't going to let you learn correctly. AI also has pitfalls that beginners will not likely notice and then fall into very bad habits.

Case in point, it fails to generate properly working code:

2

1 aaa

2 aaa

It returns 0 (clearly off here)

3

1 aaa

2 a

It returns 1 (no where does it state that passwords are unique, just how many passwords match...so you can have 2 passwords the same)

The other thing is while it talks about issues with the original code, it fails to point out using unsafe functions. I could easily overflow things by putting in a password longer than 10 characters.

Overall I wouldn't be impressed by what it had produced. While it does admittedly get closer to the idea I would use, overall things seem inefficient.

To @g0ated at least from a glance of your code, ultimately what fails is that you are slicing up the passwords and inserting them which means that something like

aaa is being added as a, aa, aaa (it's actually worse because you have the double for loop) into essentially the password list without the proper distinction that they truly belong to a single password. While it's nice in that it means you can do more simple comparisons later I would say it's not worth the memory inefficiency by doing it like that. (Like at that point it's almost better to just brute force it every single time).

There's also the general issue about hash collisions. If you insert into the table but there already was one that matched you will not insert a new version of it (despite the fact that users can have the same password). This creates a general issue overall. In this case I would just completely do away with hashing all together and stick to some form of list.

I know it's a late reply, but it's my 2 cents in regards to this because the AI answer I found to be inadequate and potentially misleading

wasab · July 11, 2025

3 hours ago, wanderingfool2 said:

While AI is great using AI as a crutch to complete work isn't going to let you learn correctly. AI also has pitfalls that beginners will not likely notice and then fall into very bad habits.

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough.

wanderingfool2 · July 12, 2025

7 hours ago, wasab said:

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough.

Hashing, and in general the hash buckets are a potential issue...but I would argue that it's not the fundamental issue. I think the issue is more about breaking down each password into multiple substrings etc.

It's actually why I don't necessarily agree with just purely putting code into AI and asking it to fix it when there might not originally be an understanding of algorithms etc. It's a good ability to be able to analyze code yourself and figure out where things are breaking/thinking about why it's breaking. If you always use AI to do such a thing you can quickly forget about the trivial little things that you think the AI will have handled.

As another example, MAX_USERS...strictly speaking the max users can only ever be 100,000

Honestly though the use of unsafe functions is quite telling as well...people will think it works and for the most cases it will be but things like

2

1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 a

will corrupt the memory,

You also will have memory corruptions/reads when using

2

1 ABCDEFG

2 A

As it doesn't have proper error handling when char - a >= 26 (i.e. like capital letters) and more so when char - a < 0

It's why AI can be a powerful learning tool, but at the same time you need to be aware that the answers and information it can tell can be garbage or very bad habits (that once upon a time wouldn't have been considered bad but in the modern day is)

g0ated · July 12, 2025

9 hours ago, wasab said:

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough.

I ended up not using a trie since it was not covered in my algo book yet so I choose to use an array of substring to check stuff(kinda forgot about the problem) but here is my solution:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#define LEN 11
#define TABLE_SIZE 131072

#define hashsize(n) ((unsigned long)1 << (n))
#define hashmask(n) (hashsize(n) - 1)

unsigned long oaat(const char *key, unsigned long len, unsigned long bits) {
    unsigned long hash, i;
    for (hash = 0, i = 0; i < len; i++) {
    hash += key[i];
    hash += (hash << 10);
    hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash & hashmask(bits);
}

typedef struct entry {
    int count;
    char* substring;
    struct entry* next;
} entry;

static entry* hashtable[TABLE_SIZE] = {NULL};

char* slice(const char* str, int start, int end) {
    char* substringBuffer = (char *) malloc(LEN * sizeof(char));
    int j = 0;
    for (int i = start; i <= end; i++) {
        substringBuffer[j] = str[i];
        j++;
    }
    substringBuffer[j] = '\0';
    return substringBuffer;
}


void add (const char *password) {
    int key;
    int len = strlen(password);
    char* substrings_seen[100];
    int num_seen = 0;
    for (int i = 0; i < len; i++) {
        for (int j = i; j < len; j++) {
            bool seenBefore = false;
            char* substring;
            substring = slice(password, i, j);

            for (int k = 0; k < num_seen; k++) {
                if (strcmp(substrings_seen[k], substring) == 0) {
                    seenBefore = true;
                    break;
                }
            }

            if (!seenBefore) {
                substrings_seen[num_seen++] = substring;
                key = oaat(substring, strlen(substring), 17);
                entry* cur = hashtable[key];
                int isThere = 0;
                while (cur != NULL) { 
                    if (!(strcmp(cur->substring, substring))) {
                        cur->count++;
                        isThere = 1;
                        break;
                    }
                    cur = cur->next;
                }
                if (!isThere) {
                    cur = (entry *) malloc(sizeof(entry));
                    cur->count = 1;
                    cur->substring = substring;
                    cur->next = hashtable[key];
                    hashtable[key] = cur;
                }
            } else {
                free(substring);
            }
        }
    }
}

int query (const char *str) {
    int key = oaat(str, strlen(str), 17);
    entry* cur = hashtable[key];
    while (cur != NULL) {
        if(!strcmp(cur->substring, str)) {
            return cur->count;
        }
        cur = cur->next;
    }
    return 0;
}


int main(int argc, char** argv) {
    int operations, opcode;
    char password[LEN];
    scanf("%d", &operations);

    for (int i = 0; i < operations; i++) {
        scanf("%d%s", &opcode, password);

        if (opcode == 1) {
            add(password);
        } else {
            printf("%d\n", query(password));
        }
    }
}

2 hours ago, wanderingfool2 said:

Hashing, and in general the hash buckets are a potential issue...but I would argue that it's not the fundamental issue. I think the issue is more about breaking down each password into multiple substrings etc.

It's actually why I don't necessarily agree with just purely putting code into AI and asking it to fix it when there might not originally be an understanding of algorithms etc. It's a good ability to be able to analyze code yourself and figure out where things are breaking/thinking about why it's breaking. If you always use AI to do such a thing you can quickly forget about the trivial little things that you think the AI will have handled.

As another example, MAX_USERS...strictly speaking the max users can only ever be 100,000

Honestly though the use of unsafe functions is quite telling as well...people will think it works and for the most cases it will be but things like

2

1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 a

will corrupt the memory,

You also will have memory corruptions/reads when using

2

1 ABCDEFG

2 A

As it doesn't have proper error handling when char - a >= 26 (i.e. like capital letters) and more so when char - a < 0

It's why AI can be a powerful learning tool, but at the same time you need to be aware that the answers and information it can tell can be garbage or very bad habits (that once upon a time wouldn't have been considered bad but in the modern day is)

Thanks for your two cents, I ended up asking chatgpt about the bug in my code which @wasab helped me figure out, and it gave me some code that I could add to fix it. AI IMO only excels if you know what the problem is. It's quite terrible in terms of finding errors in huge code bases or when the problem like this one is feed into it.

wanderingfool2 · July 14, 2025

On 7/12/2025 at 1:11 AM, g0ated said:

Thanks for your two cents, I ended up asking chatgpt about the bug in my code which @wasab helped me figure out, and it gave me some code that I could add to fix it. AI IMO only excels if you know what the problem is. It's quite terrible in terms of finding errors in huge code bases or when the problem like this one is feed into it.

A few comments, just as an educational thing.

sscanf can be a dangerous function since it doesn't bother doing buffer checks. It's why in your version if I wanted to I could cause it to crash with specific inputs (and if I had enough time I could write a payload which would be exploitable).

Definitely a good first attempt, and honestly a lot of people probably do end up writing code like that in their career (I know I've written my fair share of exploitable code...but mainly for inhouse programs that speed of development trumps the maybe 100 users who use it so I didn't view as a security risk).

Anyways, scanf can be troublesome, but there are ways around it.

fgets is better in that it accepts a buffer.

e.g.

char lineBuffer[32];
fgets(lineBuffer, sizeof lineBuffer, stdin);

The above, even if they put in 2 asdflkajtwlkejttoaijsdfoiawjflwjaeflkjasdlfkjalsdkjfalskdjflaskjfdlasjkdflaksdjflaskjdfa

you will not get it corrupting memory.

Good that you changed using a straight hash function to adding it in with the idea of collisions being potentially present.

Your substrings_seen[100] is using a magic number, and strictly speaking is overkill for a password limit of 10. At a size limit of 10, there are technically only 55 entries (10 + 1) * 10) / 2. Although then again the driving force is the 55 new passwords stored each time, so I guess it doesn't matter as much...but still something to think of.

Not that I recommend doing this, but a neat optimization that could occur (if lets say you were doing to have a lot of large numbers etc)...contrained to a - z, that means there are exactly 26 options (so you can store it between 0 - 25)...which means you could use 5 bits to represent each character...which means a UINT64 (unsigned long long) you could hold a password of length 12...and by having it as an integer it makes comparisons so much quicker....but that's just an over optimization at that point :p

FuzzyLogic · July 14, 2025

I thought this problem was interesting and I haven't used C in several years, so I tried it myself without having looked at any of the other solutions. Here's my code:

Spoiler

#include <stdio.h>
#include <malloc.h>
#include <string.h>

// BEGIN SET

#define STRINGSET_STARTING_CAPACITY 10

struct stringset {
    char** strings;
    size_t size;
    size_t capacity;
};

struct stringset stringset_init() {
    struct stringset set;
    set.strings = malloc(sizeof(char*)*STRINGSET_STARTING_CAPACITY);
    set.capacity = STRINGSET_STARTING_CAPACITY;
    set.size = 0;
    return set;
}

void stringset_add(struct stringset* set, char* str) {
    // First ensure the set doesn't already contain the specified string
    for (size_t i = 0; i < set->size; i++) {
        if (strcmp(str, set->strings[i]) == 0) {
            // Set already contains the string, nothing left to do
            return;
        }
    }

    // The set doesn't already contain the string, so we add it, expanding the backing array if needed
    if (set->size == set->capacity) {
        set->strings = realloc(set->strings, set->capacity*2*sizeof(char*));
        set->capacity *= 2;
    }

    set->strings[set->size] = str;
    set->size++;
}

void stringset_destroy(struct stringset* set) {
    for (size_t i = 0; i < set->size; i++) {
        free(set->strings[i]);
    }
    free(set->strings);
}

// END SET

// BEGIN TRIE

#define NUM_ALLOWED_CHARS 26

struct node {
    unsigned int data;
    struct node* children[NUM_ALLOWED_CHARS];
};

void trie_add(struct node* root_node, char* str) {
    struct node* curr_node = root_node;

    // Walk down the trie, using the current char within the string to select the child at each node
    for (size_t i = 0; i < strlen(str); i++) {
        size_t index = str[i] - 'a';

        // Create the next child node if it doesn't exist yet
        if (curr_node->children[index] == NULL) {
            struct node *new_node = malloc(sizeof(struct node));
            new_node->data = 0;
            memset(new_node->children, 0, NUM_ALLOWED_CHARS * sizeof(struct node *));
            curr_node->children[index] = new_node;
        }

        // Go 1 step further down the trie
        curr_node = curr_node->children[index];
    }

    // At this point, we are at the node targeted by the input string
    curr_node->data++;
}

unsigned int trie_query(struct node* root_node, char* str) {
    struct node* curr_node = root_node;

    // Use the same strategy as trie_add, but immediately stop if we encounter a child node that doesn't exist yet
    for (size_t i = 0; i < strlen(str); i++) {
        size_t index = str[i] - 'a';

        if (curr_node->children[index] == NULL) {
            // This node was never added, so the password must appear 0 times
            return 0;
        } else {
            // Go 1 step further down the trie
            curr_node = curr_node->children[index];
        }
    }

    // At this point, we are at the node targeted by the input string
    return curr_node->data;
}

// END TRIE

#define MAX_PASSWORD_LENGTH 10

struct stringset get_substrings(char* str) {
    struct stringset ss = stringset_init();

    for (size_t num_chars_to_copy = 1; num_chars_to_copy <= strlen(str); num_chars_to_copy++) {
        for (size_t str_index = 0; str_index < strlen(str) - num_chars_to_copy + 1; str_index++) {
            char* substr = calloc(num_chars_to_copy+1, sizeof(char));
            strncpy(substr, str+str_index, num_chars_to_copy);
            stringset_add(&ss, substr);
        }
    }

    return ss;
}

int main() {
    struct node root;
    root.data = 0;
    memset(root.children, 0, NUM_ALLOWED_CHARS*sizeof(struct node*));
    char password[MAX_PASSWORD_LENGTH+1];
    memset(password, 0, MAX_PASSWORD_LENGTH+1);

    unsigned int n_queries;
    scanf("%u", &n_queries);

    for (unsigned int i = 0; i < n_queries; i++) {
        unsigned int query_type;
        scanf("%u %s", &query_type, password);

        if (query_type == 1) {
            struct stringset substrings = get_substrings(password);
            for (size_t index = 0; index < substrings.size; index++) {
                trie_add(&root, substrings.strings[index]);
            }
            stringset_destroy(&substrings);
        } else {
            unsigned int n_accounts = trie_query(&root, password);
            printf("%u\n", n_accounts);
        }

        memset(password, 0, MAX_PASSWORD_LENGTH+1);
    }

    return 0;
}

Explanation:

Spoiler

My solution is centered around a modified trie, which provides quick insertion and access. When adding a password, I first split it into all possible substrings and add them to a set so that it only contains unique substrings. For the input password "aaa", the set will only contain the substrings "a", "aa", and "aaa".

Once we have all unique substrings, we add them to a trie where each node contains the number of times its associated substring appeared as well as an array of 26 pointers to child nodes. These child pointers are only populated when needed, during an insertion operation.

To add the substring "abc" to the trie, we increment the value of the node at root->a->b->c.

To query the trie with substring "xy", we return the value at node root->x->y.

I'm aware there are MANY optimizations possible here, but it ran within the time and memory limits so I stopped there.

And yes, I know you should check the return values of scanf and malloc-and-friends, and use fgets so that user input can't cause a buffer overflow. Oh well

Now looking at your code, a hash table is a valid way of solving this problem. And yes, it looks like the original problem was with generating the unique substrings.

mariushm · July 22, 2025

A bit late to this but parsed the posts now and realized nobody caught this ....

"The passwords consist of at least one and at most 10 lowercase letters of the English alphabet."

So you're dealing with 0x61 to 0x7A (97 to 122) .. 26 unique values ... you can store 32 values (0..31) in 5 bits, so you only need 10 bits to "hash" two characters of a password.

So you could have 1024 "buckets" , parse all the passwords, and read two consecutive characters at a time from each password and add the unique ID of the row to that bucket (and maybe other info that could be useful like the length of the password in that row)

When you have the new password, you can again split in two characters at a time, and you can simply look in the bucket with that 10 bit code to see which rows have password which match your small two character sequence. Copy the IDs to your results. Get the next two character sequence, copy the IDs from that bucket to your results making sure you don't have duplicates.... and so on ... at most you have 9 pairs of characters, so you'll access at most 9 buckets out of 1024.

Then once you have all the rows with passwords that contain at least a 2 character sequence found in your new password, you can start with the maximum possible length ( password length - 1, minimum 2) and loop through the results you found, and check all rows that have passwords with length at least your length + 1

edit apparently passwords can be even 1 char long. Could still hash the first char as 31 << 5 + code of first char.

Sign In

Algo problem in C

Summary of problem:

Problematic behavior in your implementation:

Correct, scalable design should either:

Minimal working fix:

Option 2: Trie-Based Substring Indexing

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Summary of problem:

Problematic behavior in your implementation:

Correct, scalable design should either:

Minimal working fix:

Option 2: Trie-Based Substring Indexing

Link to comment

Share on other sites

Link to post

Share on other sites

Summary of problem:

Problematic behavior in your implementation:

Correct, scalable design should either:

Minimal working fix:

Option 2: Trie-Based Substring Indexing

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Biggest Test Bench I’ve Ever Seen

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show: