Jump to content

Algo problem in C

Go to solution Solved by wasab,

📌 Summary of problem:

  • Adding a password → store all possible substrings of the password.

  • Querying a string → count how many other passwords (not equal to the query string itself) have this string as a substring.


📌 Problematic behavior in your implementation:

Bug location is in how substrings are hashed and stored.

Your current implementation:

  • At each add(password) call:

    • Creates all possible substrings.

    • Stores an entry for each substring in a hash table substrings with the substring as substr, and the original password as ogPassword.

In query(password, password):

  • Hash the query password and check the linked list at that hash bucket.

  • For each matching substr equal to query string and different ogPassword, increment counter.

⚠️ The issue:
You hash each substring individually by value. But — different substrings from different passwords could hash to the same bucket, and your lookup only scans the linked list at a single hash bucket for a given query string's hash value.

That means:

  • You’ll only see entries whose hashed value exactly matches the hash of the full query string.

  • But many substrings of other passwords containing the query string could hash to other hash buckets.

→ The hashing strategy here breaks correctness because you can't rely on the hash bucket of the query string to contain all substrings equal to it.
Two identical strings will always hash to the same value, yes — but you would have had to insert that exact string as a substring somewhere to be in that bucket.

In short:
The current design can only ever find substrings equal to the query string if it was itself inserted as a substring during a password’s add operation.
This works, but it's highly inefficient: for querying, you'd need to find all passwords whose substrings contain the query string — and with only a hash lookup by exact string length, you miss all other potential locations.


📌 Correct, scalable design should either:

  • Use a trie (prefix tree) to store all substrings and allow efficient substring matching during queries.

  • Brute-force: Store a list of all passwords, and on query, for each password (except the query one itself), check strstr() if query string is a substring.

    • It’s acceptable here since passwords are short (≤10 characters) and per problem constraints.


📌 Minimal working fix:

If sticking to your current hash-table based approach:

  • During query(password, password) — instead of hashing the query and looking only at that bucket, you'd need to:

    • Iterate over all hash buckets.

    • Check each entry in all linked lists.

    • For entries where substr == password and ogPassword != password, increment the counter.

Why? Because identical substrings could hash to different buckets unless the hash function is perfectly consistent (but that's unnecessary overhead for this).

 

 

Option 2: Trie-Based Substring Indexing

Idea:

  • Each node represents a character.

  • Paths down the trie form substrings.

  • Each node tracks which passwords inserted this substring (as a linked list or set).

  • On query: follow the trie path for the query string. If path exists — count number of unique passwords associated with this substring (excluding the query password itself).

 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PASSWORD_LEN 12
#define ALPHABET_SIZE 26
#define MAX_USERS 131072

typedef struct PasswordNode {
    char* password;
    struct PasswordNode* next;
} PasswordNode;

typedef struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    PasswordNode* passwords;
} TrieNode;

TrieNode* createTrieNode() {
    TrieNode* node = (TrieNode*)malloc(sizeof(TrieNode));
    for (int i = 0; i < ALPHABET_SIZE; i++)
        node->children[i] = NULL;
    node->passwords = NULL;
    return node;
}

TrieNode* root;
char* passwords[MAX_USERS];
int passwordCount = 0;

void addPasswordToNode(TrieNode* node, const char* password) {
    PasswordNode* p = node->passwords;
    while (p != NULL) {
        if (strcmp(p->password, password) == 0) return;  // already recorded
        p = p->next;
    }

    PasswordNode* newPasswordNode = (PasswordNode*)malloc(sizeof(PasswordNode));
    newPasswordNode->password = (char*)malloc(strlen(password) + 1);
    strcpy(newPasswordNode->password, password);
    newPasswordNode->next = node->passwords;
    node->passwords = newPasswordNode;
}

void add(const char* password) {
    passwords[passwordCount] = malloc(strlen(password) + 1);
    strcpy(passwords[passwordCount], password);
    passwordCount++;

    int len = strlen(password);
    for (int start = 0; start < len; start++) {
        TrieNode* curr = root;
        for (int i = start; i < len; i++) {
            int idx = password[i] - 'a';
            if (!curr->children[idx])
                curr->children[idx] = createTrieNode();
            curr = curr->children[idx];
            addPasswordToNode(curr, password);
        }
    }
}

int query(const char* queryString) {
    TrieNode* curr = root;
    int len = strlen(queryString);
    for (int i = 0; i < len; i++) {
        int idx = queryString[i] - 'a';
        if (!curr->children[idx])
            return 0;
        curr = curr->children[idx];
    }

    int counter = 0;
    PasswordNode* p = curr->passwords;
    while (p != NULL) {
        if (strcmp(p->password, queryString) != 0)
            counter++;
        p = p->next;
    }
    return counter;
}

void freeTrie(TrieNode* node) {
    if (!node) return;
    for (int i = 0; i < ALPHABET_SIZE; i++)
        freeTrie(node->children[i]);

    PasswordNode* p = node->passwords;
    while (p) {
        PasswordNode* temp = p;
        p = p->next;
        free(temp->password);
        free(temp);
    }
    free(node);
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[MAX_PASSWORD_LEN];

    root = createTrieNode();

    for (int i = 0; i < operations; i++) {
        scanf("%d %s", &opcode, password);
        if (opcode == 1) {
            add(password);
        } else if (opcode == 2) {
            printf("%d\n", query(password));
        }
    }

    // clean up
    for (int i = 0; i < passwordCount; i++)
        free(passwords[i]);
    freeTrie(root);

    return 0;
}

 

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time. 

Posted (edited)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LEN 15
#define TABLE_SIZE 131072

#define hashsize(n) ((unsigned long)1 << (n))
#define hashmask(n) (hashsize(n) - 1)

unsigned long oaat(const char *key, unsigned long len, unsigned long bits) {
    unsigned long hash, i;
    for (hash = 0, i = 0; i < len; i++) {
    hash += key[i];
    hash += (hash << 10);
    hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash & hashmask(bits);
}

typedef struct entry {
    char *substr;
    struct entry* next;
    char* ogPassword;
} entry;

static entry* substrings[TABLE_SIZE] = {NULL};

char* slice(char* substringBuffer, const char* str, int start, int end) {
    int j = 0;
    for (int i = start; i <= end; i++) {
        substringBuffer[j] = str[i];
        j++;
    }
    substringBuffer[j] = '\0';
    return substringBuffer;
}

void add(const char *password) {
    int key;
    int length = strlen(password);
    
    for (int start = 0; start < length; start++) {
        for (int end = start; end < length; end++) {
            char *buffer = (char *) malloc(LEN * sizeof(char));
            slice(buffer, password, start, end);
            key = oaat(buffer, strlen(buffer), 17);
            entry* temp = (entry *) malloc(sizeof(entry));
            temp->ogPassword = malloc(strlen(password) + 1); 
            strcpy(temp->ogPassword, password);
            temp->substr = buffer; 
            temp->next = substrings[key]; 
            substrings[key] = temp;
        }
    }
}

int query(const char *password, const char* og) {
    int key = oaat(password, strlen(password), 17);
    int counter = 0;
    entry* cur = substrings[key];
    while(cur != NULL) {
        if ((strcmp(cur->substr, password) == 0) && strcmp(cur->ogPassword, og) != 0) { counter++; }
        cur = cur->next;
    }
    return counter;
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[12];
    for (int i = 0; i < operations; i++) {
        scanf("%d", &opcode);
        scanf("%s", password);

        if (opcode == 1) {
            add(password);
        }
        else if (opcode == 2) {
            printf("%d\n", query(password, password));
        }
    }
    return 0;
}

This is the link to the problem: https://dmoj.ca/problem/coci17c1p3hard

It's not working, and I can't figure out why. Thanks in advance 🙂

Edited by goatedpenguin
Added syntax highlighting
Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/
Share on other sites

Link to post
Share on other sites

My DSA skills are rusty and also undeveloped (still need to start my leetcode journey) and its 5 in the morning and I need to sleep (my schedule is cooked fr).

 

What is stopping you from fixing it with AI? My code ran.

 

I am not even going to dare to understand it. Managing a hash map in C is ugly. Just learn C++ by now or use Python or JavaScript for DSA. You should provide some explanation so it becomes easier to understand.

 

While trying to implement it myself, I first thought about a simple char db[Q][11] which will be our database and as I was thinking about implementing a hash map I thought instead of that, I could use a 2D array dictionary where the outer array consists of a-z and the sub-arrays contain the position of each character in the database, optimizing by making use of a single number by using the formula ((Nth query - 1) * 11) + offset where Nth query - 1 are all the queries that we skipped, 11 is the length of each query (including null terminator) and the offset is the offset in the current query. To find the sub string, we could jump to all occurrences of the first letter of our sub string to find by using our dictionary.

 

The database was allocated on the stack and I thought of allocating the dictionary on the stack as well but I realized the worst case of 100,000 queries * 10 max length means 1 million possible characters, and that too if we want to put them under 26 arrays that would be 26 million and if they contain 32 bit integers, that would be 104 MB + the worst case of 1 MB of the database which of course isn't going to fit on the stack. But still the worst case amount of characters are 1 million and not 26 million so perhaps if we could fit the a-z dictionary in the 1 million space only but since dictionaries work on a pre-defined offset for search, that isn't possible to adjust. 

You could use DMA to start with much lesser then.

 

But then I realized a hash map is indeed a much better solution. I thought earlier because I had the same idea of search but instead of hashing 'a', we literally just have its ASCII value so why hash it, but hashing all the sub strings of the string didn't come to my mind. I thought that's like difficult and also would be slow, but the thing is, each query has a max of 10 characters so the max substrings in a query will be 54 sub strings, which though will take more time than storing the positions of 10 characters in my earlier implementation, when it comes to searching for that sub string, with the hash map it is instant but with the dictionary, you would still have to go all around the database and check each occurrence of the first letter of the sub string and see if the sub string completes. My earlier implementation could be faster in some scenarios, especially lighter ones but when it comes to having a large number of queries, hash map could be better. Also with the hash map you only need a Hash Table whereas with mine you needed a database and a dictionary but that is simpler.

 

I don't know what you are doing with bit shifts. Use a simple hash function like this - 

#define TABLE_SIZE 100000
unsigned int hash(char* str){
    unsigned int value = 0;
    while(*str){
        value = (value * 31) + *str++;
    }
    return value % TABLE_SIZE;
}

Remember to account for hash collisions and do linear probing.

 

I would start with getting the string to store, write an algorithm to extract all the sub strings and hash them, and at that location of the hash, increment the count but only if the sub string is unique for that query because you could get multiple same sub strings from a single query. After that when asked to search, hash to the sub string you want to output the count. Try to allocate the Hash Map on the stack as well, it helps in performance.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16749341
Share on other sites

Link to post
Share on other sites

1 hour ago, Haswellx86 said:

What is stopping you from fixing it with AI? My code ran.

Whats stopping from anyone using ai? Because it’s bad, I tried and it doesn’t work. 
 

1 hour ago, Haswellx86 said:

I am not even going to dare to understand it. Managing a hash map in C is ugly. Just learn C++ by now or use Python or JavaScript for DSA. You should provide some explanation so it becomes easier to understand.

It’s not if you know it well. C is the best language for learning DS and algos because you are not given any libraries to do any work for you making it simple and easy to stay on point. Besides my goal isn’t to use c++ and using py or js is the worst for learning DSA. This a pretty short problem so providing an explanation isn’t necessary to me at least. 

 

1 hour ago, Haswellx86 said:

While trying to implement it myself, I first thought about a simple char db[Q][11] which will be our database and as I was thinking about implementing a hash map I thought instead of that, I could use a 2D array dictionary where the outer array consists of a-z and the sub-arrays contain the position of each character in the database, optimizing by making use of a single number by using the formula ((Nth query - 1) * 11) + offset where Nth query - 1 are all the queries that we skipped, 11 is the length of each query (including null terminator) and the offset is the offset in the current query. To find the sub string, we could jump to all occurrences of the first letter of our sub string to find by using our dictionary.

This doesn’t make sense at all.

 

1 hour ago, Haswellx86 said:

The database was allocated on the stack and I thought of allocating the dictionary on the stack as well but I realized the worst case of 100,000 queries * 10 max length means 1 million possible characters, and that too if we want to put them under 26 arrays that would be 26 million and if they contain 32 bit integers, that would be 104 MB + the worst case of 1 MB of the database which of course isn't going to fit on the stack. But still the worst case amount of characters are 1 million and not 26 million so perhaps if we could fit the a-z dictionary in the 1 million space only but since dictionaries work on a pre-defined offset for search, that isn't possible to adjust. 

You could use DMA to start with much lesser then.

A-Z dict wdym??? The hash table doesn’t have to be that big that’s the whole point of the hashing function and yes what you said is already implied. 

 

1 hour ago, Haswellx86 said:

But then I realized a hash map is indeed a much better solution. I thought earlier because I had the same idea of search but instead of hashing 'a', we literally just have its ASCII value so why hash it, but hashing all the sub strings of the string didn't come to my mind. I thought that's like difficult and also would be slow, but the thing is, each query has a max of 10 characters so the max substrings in a query will be 54 sub strings, which though will take more time than storing the positions of 10 characters in my earlier implementation, when it comes to searching for that sub string, with the hash map it is instant but with the dictionary, you would still have to go all around the database and check each occurrence of the first letter of the sub string and see if the sub string completes. My earlier implementation could be faster in some scenarios, especially lighter ones but when it comes to having a large number of queries, hash map could be better. Also with the hash map you only need a Hash Table whereas with mine you needed a database and a dictionary but that is simpler.

I don’t think you understand what a hashmap and a dictionary are, there are the same thing. A dictionary is a Python term for a hashmap, the hashmap is already implemented. your worrying too much about memory the max size limit is 128MB which is more than enough. The database is the hash table. 

 

1 hour ago, Haswellx86 said:

I don't know what you are doing with bit shifts. Use a simple hash function like this - 

#define TABLE_SIZE 100000
unsigned int hash(char* str){
    unsigned int value = 0;
    while(*str){
        value = (value * 31) + *str++;
    }
    return value % TABLE_SIZE;
}

Remember to account for hash collisions and do linear probing.

I am using jenson’s hash algorithm which is good since it minimizes collisions and hence the bit shifts. Your hash function is horrible and incredibly slow. 

This whole post is misleading especially with you confusing what a hashmap and dict are, plus you just gave a lecture on some random crap instead of looking at the real problem which is when I am querying for a password p and searching it in the hash table to count how many substrings match it excluding the substrings it made when it was added, previously. This would require some sort of id in each entry, in my case, I used the og password to exclude any substrings that matched with my password inputted at first. However, I can’t figure out why it doesn’t work though it looks like it should. 

 

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16749359
Share on other sites

Link to post
Share on other sites

📌 Summary of problem:

  • Adding a password → store all possible substrings of the password.

  • Querying a string → count how many other passwords (not equal to the query string itself) have this string as a substring.


📌 Problematic behavior in your implementation:

Bug location is in how substrings are hashed and stored.

Your current implementation:

  • At each add(password) call:

    • Creates all possible substrings.

    • Stores an entry for each substring in a hash table substrings with the substring as substr, and the original password as ogPassword.

In query(password, password):

  • Hash the query password and check the linked list at that hash bucket.

  • For each matching substr equal to query string and different ogPassword, increment counter.

⚠️ The issue:
You hash each substring individually by value. But — different substrings from different passwords could hash to the same bucket, and your lookup only scans the linked list at a single hash bucket for a given query string's hash value.

That means:

  • You’ll only see entries whose hashed value exactly matches the hash of the full query string.

  • But many substrings of other passwords containing the query string could hash to other hash buckets.

→ The hashing strategy here breaks correctness because you can't rely on the hash bucket of the query string to contain all substrings equal to it.
Two identical strings will always hash to the same value, yes — but you would have had to insert that exact string as a substring somewhere to be in that bucket.

In short:
The current design can only ever find substrings equal to the query string if it was itself inserted as a substring during a password’s add operation.
This works, but it's highly inefficient: for querying, you'd need to find all passwords whose substrings contain the query string — and with only a hash lookup by exact string length, you miss all other potential locations.


📌 Correct, scalable design should either:

  • Use a trie (prefix tree) to store all substrings and allow efficient substring matching during queries.

  • Brute-force: Store a list of all passwords, and on query, for each password (except the query one itself), check strstr() if query string is a substring.

    • It’s acceptable here since passwords are short (≤10 characters) and per problem constraints.


📌 Minimal working fix:

If sticking to your current hash-table based approach:

  • During query(password, password) — instead of hashing the query and looking only at that bucket, you'd need to:

    • Iterate over all hash buckets.

    • Check each entry in all linked lists.

    • For entries where substr == password and ogPassword != password, increment the counter.

Why? Because identical substrings could hash to different buckets unless the hash function is perfectly consistent (but that's unnecessary overhead for this).

 

 

Option 2: Trie-Based Substring Indexing

Idea:

  • Each node represents a character.

  • Paths down the trie form substrings.

  • Each node tracks which passwords inserted this substring (as a linked list or set).

  • On query: follow the trie path for the query string. If path exists — count number of unique passwords associated with this substring (excluding the query password itself).

 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PASSWORD_LEN 12
#define ALPHABET_SIZE 26
#define MAX_USERS 131072

typedef struct PasswordNode {
    char* password;
    struct PasswordNode* next;
} PasswordNode;

typedef struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    PasswordNode* passwords;
} TrieNode;

TrieNode* createTrieNode() {
    TrieNode* node = (TrieNode*)malloc(sizeof(TrieNode));
    for (int i = 0; i < ALPHABET_SIZE; i++)
        node->children[i] = NULL;
    node->passwords = NULL;
    return node;
}

TrieNode* root;
char* passwords[MAX_USERS];
int passwordCount = 0;

void addPasswordToNode(TrieNode* node, const char* password) {
    PasswordNode* p = node->passwords;
    while (p != NULL) {
        if (strcmp(p->password, password) == 0) return;  // already recorded
        p = p->next;
    }

    PasswordNode* newPasswordNode = (PasswordNode*)malloc(sizeof(PasswordNode));
    newPasswordNode->password = (char*)malloc(strlen(password) + 1);
    strcpy(newPasswordNode->password, password);
    newPasswordNode->next = node->passwords;
    node->passwords = newPasswordNode;
}

void add(const char* password) {
    passwords[passwordCount] = malloc(strlen(password) + 1);
    strcpy(passwords[passwordCount], password);
    passwordCount++;

    int len = strlen(password);
    for (int start = 0; start < len; start++) {
        TrieNode* curr = root;
        for (int i = start; i < len; i++) {
            int idx = password[i] - 'a';
            if (!curr->children[idx])
                curr->children[idx] = createTrieNode();
            curr = curr->children[idx];
            addPasswordToNode(curr, password);
        }
    }
}

int query(const char* queryString) {
    TrieNode* curr = root;
    int len = strlen(queryString);
    for (int i = 0; i < len; i++) {
        int idx = queryString[i] - 'a';
        if (!curr->children[idx])
            return 0;
        curr = curr->children[idx];
    }

    int counter = 0;
    PasswordNode* p = curr->passwords;
    while (p != NULL) {
        if (strcmp(p->password, queryString) != 0)
            counter++;
        p = p->next;
    }
    return counter;
}

void freeTrie(TrieNode* node) {
    if (!node) return;
    for (int i = 0; i < ALPHABET_SIZE; i++)
        freeTrie(node->children[i]);

    PasswordNode* p = node->passwords;
    while (p) {
        PasswordNode* temp = p;
        p = p->next;
        free(temp->password);
        free(temp);
    }
    free(node);
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[MAX_PASSWORD_LEN];

    root = createTrieNode();

    for (int i = 0; i < operations; i++) {
        scanf("%d %s", &opcode, password);
        if (opcode == 1) {
            add(password);
        } else if (opcode == 2) {
            printf("%d\n", query(password));
        }
    }

    // clean up
    for (int i = 0; i < passwordCount; i++)
        free(passwords[i]);
    freeTrie(root);

    return 0;
}

 

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time. 

Sudo make me a sandwich 

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16749376
Share on other sites

Link to post
Share on other sites

2 hours ago, wasab said:

📌 Summary of problem:

  • Adding a password → store all possible substrings of the password.

  • Querying a string → count how many other passwords (not equal to the query string itself) have this string as a substring.


📌 Problematic behavior in your implementation:

Bug location is in how substrings are hashed and stored.

Your current implementation:

  • At each add(password) call:

    • Creates all possible substrings.

    • Stores an entry for each substring in a hash table substrings with the substring as substr, and the original password as ogPassword.

In query(password, password):

  • Hash the query password and check the linked list at that hash bucket.

  • For each matching substr equal to query string and different ogPassword, increment counter.

⚠️ The issue:
You hash each substring individually by value. But — different substrings from different passwords could hash to the same bucket, and your lookup only scans the linked list at a single hash bucket for a given query string's hash value.

That means:

  • You’ll only see entries whose hashed value exactly matches the hash of the full query string.

  • But many substrings of other passwords containing the query string could hash to other hash buckets.

→ The hashing strategy here breaks correctness because you can't rely on the hash bucket of the query string to contain all substrings equal to it.
Two identical strings will always hash to the same value, yes — but you would have had to insert that exact string as a substring somewhere to be in that bucket.

In short:
The current design can only ever find substrings equal to the query string if it was itself inserted as a substring during a password’s add operation.
This works, but it's highly inefficient: for querying, you'd need to find all passwords whose substrings contain the query string — and with only a hash lookup by exact string length, you miss all other potential locations.


📌 Correct, scalable design should either:

  • Use a trie (prefix tree) to store all substrings and allow efficient substring matching during queries.

  • Brute-force: Store a list of all passwords, and on query, for each password (except the query one itself), check strstr() if query string is a substring.

    • It’s acceptable here since passwords are short (≤10 characters) and per problem constraints.


📌 Minimal working fix:

If sticking to your current hash-table based approach:

  • During query(password, password) — instead of hashing the query and looking only at that bucket, you'd need to:

    • Iterate over all hash buckets.

    • Check each entry in all linked lists.

    • For entries where substr == password and ogPassword != password, increment the counter.

Why? Because identical substrings could hash to different buckets unless the hash function is perfectly consistent (but that's unnecessary overhead for this).

 

 

Option 2: Trie-Based Substring Indexing

Idea:

  • Each node represents a character.

  • Paths down the trie form substrings.

  • Each node tracks which passwords inserted this substring (as a linked list or set).

  • On query: follow the trie path for the query string. If path exists — count number of unique passwords associated with this substring (excluding the query password itself).

 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_PASSWORD_LEN 12
#define ALPHABET_SIZE 26
#define MAX_USERS 131072

typedef struct PasswordNode {
    char* password;
    struct PasswordNode* next;
} PasswordNode;

typedef struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    PasswordNode* passwords;
} TrieNode;

TrieNode* createTrieNode() {
    TrieNode* node = (TrieNode*)malloc(sizeof(TrieNode));
    for (int i = 0; i < ALPHABET_SIZE; i++)
        node->children[i] = NULL;
    node->passwords = NULL;
    return node;
}

TrieNode* root;
char* passwords[MAX_USERS];
int passwordCount = 0;

void addPasswordToNode(TrieNode* node, const char* password) {
    PasswordNode* p = node->passwords;
    while (p != NULL) {
        if (strcmp(p->password, password) == 0) return;  // already recorded
        p = p->next;
    }

    PasswordNode* newPasswordNode = (PasswordNode*)malloc(sizeof(PasswordNode));
    newPasswordNode->password = (char*)malloc(strlen(password) + 1);
    strcpy(newPasswordNode->password, password);
    newPasswordNode->next = node->passwords;
    node->passwords = newPasswordNode;
}

void add(const char* password) {
    passwords[passwordCount] = malloc(strlen(password) + 1);
    strcpy(passwords[passwordCount], password);
    passwordCount++;

    int len = strlen(password);
    for (int start = 0; start < len; start++) {
        TrieNode* curr = root;
        for (int i = start; i < len; i++) {
            int idx = password[i] - 'a';
            if (!curr->children[idx])
                curr->children[idx] = createTrieNode();
            curr = curr->children[idx];
            addPasswordToNode(curr, password);
        }
    }
}

int query(const char* queryString) {
    TrieNode* curr = root;
    int len = strlen(queryString);
    for (int i = 0; i < len; i++) {
        int idx = queryString[i] - 'a';
        if (!curr->children[idx])
            return 0;
        curr = curr->children[idx];
    }

    int counter = 0;
    PasswordNode* p = curr->passwords;
    while (p != NULL) {
        if (strcmp(p->password, queryString) != 0)
            counter++;
        p = p->next;
    }
    return counter;
}

void freeTrie(TrieNode* node) {
    if (!node) return;
    for (int i = 0; i < ALPHABET_SIZE; i++)
        freeTrie(node->children[i]);

    PasswordNode* p = node->passwords;
    while (p) {
        PasswordNode* temp = p;
        p = p->next;
        free(temp->password);
        free(temp);
    }
    free(node);
}

int main() {
    int operations;
    scanf("%d", &operations);
    int opcode;
    char password[MAX_PASSWORD_LEN];

    root = createTrieNode();

    for (int i = 0; i < operations; i++) {
        scanf("%d %s", &opcode, password);
        if (opcode == 1) {
            add(password);
        } else if (opcode == 2) {
            printf("%d\n", query(password));
        }
    }

    // clean up
    for (int i = 0; i < passwordCount; i++)
        free(passwords[i]);
    freeTrie(root);

    return 0;
}

 

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time. 

Thanks for the in depth explanation.

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16749402
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...
On 6/15/2025 at 7:38 PM, wasab said:

Do not underestimate AI, buddy. I have many LeetCode submissions where a silly bug had me banging my head debugging for hours, but a short copy and paste into AI immediately spits out the line where the bug occurs. Use AI like StackOverflow. This is pretty much what AI is, just a better version of online googling. it wont work 100% usually, that's true but it is quite close. After some manual tweaks, it can be adjusted to work correctly, saving much time. 

While AI is great using AI as a crutch to complete work isn't going to let you learn correctly.  AI also has pitfalls that beginners will not likely notice and then fall into very bad habits.

 

Case in point, it fails to generate properly working code:

2

1 aaa

2 aaa

 

It returns 0 (clearly off here)

 

3

1 aaa

1 aaa

2 a

 

It returns 1 (no where does it state that passwords are unique, just how many passwords match...so you can have 2 passwords the same)

 

The other thing is while it talks about issues with the original code, it fails to point out using unsafe functions.  I could easily overflow things by putting in a password longer than 10 characters.

 

Overall I wouldn't be impressed by what it had produced.  While it does admittedly get closer to the idea I would use, overall things seem inefficient.

 

To @g0ated at least from a glance of your code, ultimately what fails is that you are slicing up the passwords and inserting them which means that something like

aaa is being added as a, aa, aaa (it's actually worse because you have the double for loop) into essentially the password list without the proper distinction that they truly belong to a single password.  While it's nice in that it means you can do more simple comparisons later I would say it's not worth the memory inefficiency by doing it like that.  (Like at that point it's almost better to just brute force it every single time).

 

There's also the general issue about hash collisions.  If you insert into the table but there already was one that matched you will not insert a new version of it (despite the fact that users can have the same password).  This creates a general issue overall.  In this case I would just completely do away with hashing all together and stick to some form of list.

 

I know it's a late reply, but it's my 2 cents in regards to this because the AI answer I found to be inadequate and potentially misleading

3735928559 - Beware of the dead beef

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16764853
Share on other sites

Link to post
Share on other sites

3 hours ago, wanderingfool2 said:

While AI is great using AI as a crutch to complete work isn't going to let you learn correctly.  AI also has pitfalls that beginners will not likely notice and then fall into very bad habits.

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough. 

Sudo make me a sandwich 

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16764934
Share on other sites

Link to post
Share on other sites

7 hours ago, wasab said:

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough. 

Hashing, and in general the hash buckets are a potential issue...but I would argue that it's not the fundamental issue.  I think the issue is more about breaking down each password into multiple substrings etc.

 

It's actually why I don't necessarily agree with just purely putting code into AI and asking it to fix it when there might not originally be an understanding of algorithms etc.  It's a good ability to be able to analyze code yourself and figure out where things are breaking/thinking about why it's breaking.  If you always use AI to do such a thing you can quickly forget about the trivial little things that you think the AI will have handled.

 

As another example, MAX_USERS...strictly speaking the max users can only ever be 100,000

Honestly though the use of unsafe functions is quite telling as well...people will think it works and for the most cases it will be but things like

2

1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 a

will corrupt the memory,

 

You also will have memory corruptions/reads when using

2

1 ABCDEFG

2 A

 

As it doesn't have proper error handling when char - a >= 26 (i.e. like capital letters) and more so when char - a < 0

 

 

It's why AI can be a powerful learning tool, but at the same time you need to be aware that the answers and information it can tell can be garbage or very bad habits (that once upon a time wouldn't have been considered bad but in the modern day is)

3735928559 - Beware of the dead beef

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16765062
Share on other sites

Link to post
Share on other sites

9 hours ago, wasab said:

Yeah, it's probably not the right answer. I never tested the code it spew out. Didn't mean to tell people to copy everything from AI verbatim. The issues it pointed out with op's original code snippet sounds good enough. 

I ended up not using a trie since it was not covered in my algo book yet so I choose to use an array of substring to check stuff(kinda forgot about the problem) but here is my solution:

 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#define LEN 11
#define TABLE_SIZE 131072

#define hashsize(n) ((unsigned long)1 << (n))
#define hashmask(n) (hashsize(n) - 1)

unsigned long oaat(const char *key, unsigned long len, unsigned long bits) {
    unsigned long hash, i;
    for (hash = 0, i = 0; i < len; i++) {
    hash += key[i];
    hash += (hash << 10);
    hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash & hashmask(bits);
}

typedef struct entry {
    int count;
    char* substring;
    struct entry* next;
} entry;

static entry* hashtable[TABLE_SIZE] = {NULL};

char* slice(const char* str, int start, int end) {
    char* substringBuffer = (char *) malloc(LEN * sizeof(char));
    int j = 0;
    for (int i = start; i <= end; i++) {
        substringBuffer[j] = str[i];
        j++;
    }
    substringBuffer[j] = '\0';
    return substringBuffer;
}


void add (const char *password) {
    int key;
    int len = strlen(password);
    char* substrings_seen[100];
    int num_seen = 0;
    for (int i = 0; i < len; i++) {
        for (int j = i; j < len; j++) {
            bool seenBefore = false;
            char* substring;
            substring = slice(password, i, j);

            for (int k = 0; k < num_seen; k++) {
                if (strcmp(substrings_seen[k], substring) == 0) {
                    seenBefore = true;
                    break;
                }
            }

            if (!seenBefore) {
                substrings_seen[num_seen++] = substring;
                key = oaat(substring, strlen(substring), 17);
                entry* cur = hashtable[key];
                int isThere = 0;
                while (cur != NULL) { 
                    if (!(strcmp(cur->substring, substring))) {
                        cur->count++;
                        isThere = 1;
                        break;
                    }
                    cur = cur->next;
                }
                if (!isThere) {
                    cur = (entry *) malloc(sizeof(entry));
                    cur->count = 1;
                    cur->substring = substring;
                    cur->next = hashtable[key];
                    hashtable[key] = cur;
                }
            } else {
                free(substring);
            }
        }
    }
}

int query (const char *str) {
    int key = oaat(str, strlen(str), 17);
    entry* cur = hashtable[key];
    while (cur != NULL) {
        if(!strcmp(cur->substring, str)) {
            return cur->count;
        }
        cur = cur->next;
    }
    return 0;
}


int main(int argc, char** argv) {
    int operations, opcode;
    char password[LEN];
    scanf("%d", &operations);

    for (int i = 0; i < operations; i++) {
        scanf("%d%s", &opcode, password);

        if (opcode == 1) {
            add(password);
        } else {
            printf("%d\n", query(password));
        }
    }
}

 

2 hours ago, wanderingfool2 said:

Hashing, and in general the hash buckets are a potential issue...but I would argue that it's not the fundamental issue.  I think the issue is more about breaking down each password into multiple substrings etc.

 

It's actually why I don't necessarily agree with just purely putting code into AI and asking it to fix it when there might not originally be an understanding of algorithms etc.  It's a good ability to be able to analyze code yourself and figure out where things are breaking/thinking about why it's breaking.  If you always use AI to do such a thing you can quickly forget about the trivial little things that you think the AI will have handled.

 

As another example, MAX_USERS...strictly speaking the max users can only ever be 100,000

Honestly though the use of unsafe functions is quite telling as well...people will think it works and for the most cases it will be but things like

2

1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

2 a

will corrupt the memory,

 

You also will have memory corruptions/reads when using

2

1 ABCDEFG

2 A

 

As it doesn't have proper error handling when char - a >= 26 (i.e. like capital letters) and more so when char - a < 0

 

 

It's why AI can be a powerful learning tool, but at the same time you need to be aware that the answers and information it can tell can be garbage or very bad habits (that once upon a time wouldn't have been considered bad but in the modern day is)

Thanks for your two cents, I ended up asking chatgpt about the bug in my code which @wasab helped me figure out, and it gave me some code that I could add to fix it. AI IMO only excels if you know what the problem is. It's quite terrible in terms of finding errors in huge code bases or when the problem like this one is feed into it. 

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16765102
Share on other sites

Link to post
Share on other sites

On 7/12/2025 at 1:11 AM, g0ated said:

Thanks for your two cents, I ended up asking chatgpt about the bug in my code which @wasab helped me figure out, and it gave me some code that I could add to fix it. AI IMO only excels if you know what the problem is. It's quite terrible in terms of finding errors in huge code bases or when the problem like this one is feed into it. 

A few comments, just as an educational thing.

 

sscanf can be a dangerous function since it doesn't bother doing buffer checks.  It's why in your version if I wanted to I could cause it to crash with specific inputs (and if I had enough time I could write a payload which would be exploitable).

 

Definitely a good first attempt, and honestly a lot of people probably do end up writing code like that in their career (I know I've written my fair share of exploitable code...but mainly for inhouse programs that speed of development trumps the maybe 100 users who use it so I didn't view as a security risk).

 

Anyways, scanf can be troublesome, but there are ways around it.

 

fgets is better in that it accepts a buffer.

 

e.g.

char lineBuffer[32];
fgets(lineBuffer, sizeof lineBuffer, stdin);

The above, even if they put in 2 asdflkajtwlkejttoaijsdfoiawjflwjaeflkjasdlfkjalsdkjfalskdjflaskjfdlasjkdflaksdjflaskjdfa

you will not get it corrupting memory.

 

Good that you changed using a straight hash function to adding it in with the idea of collisions being potentially present.

 

Your substrings_seen[100] is using a magic number, and strictly speaking is overkill for a password limit of 10.  At a size limit of 10, there are technically only 55 entries (10 + 1) * 10) / 2.  Although then again the driving force is the 55 new passwords stored each time, so I guess it doesn't matter as much...but still something to think of.

 

Not that I recommend doing this, but a neat optimization that could occur (if lets say you were doing to have a lot of large numbers etc)...contrained to a - z, that means there are exactly 26 options (so you can store it between 0 - 25)...which means you could use 5 bits to represent each character...which means a UINT64 (unsigned long long) you could hold a password of length 12...and by having it as an integer it makes comparisons so much quicker....but that's just an over optimization at that point :p

3735928559 - Beware of the dead beef

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16766380
Share on other sites

Link to post
Share on other sites

I thought this problem was interesting and I haven't used C in several years, so I tried it myself without having looked at any of the other solutions. Here's my code:

 

Spoiler
#include <stdio.h>
#include <malloc.h>
#include <string.h>

// BEGIN SET

#define STRINGSET_STARTING_CAPACITY 10

struct stringset {
    char** strings;
    size_t size;
    size_t capacity;
};

struct stringset stringset_init() {
    struct stringset set;
    set.strings = malloc(sizeof(char*)*STRINGSET_STARTING_CAPACITY);
    set.capacity = STRINGSET_STARTING_CAPACITY;
    set.size = 0;
    return set;
}

void stringset_add(struct stringset* set, char* str) {
    // First ensure the set doesn't already contain the specified string
    for (size_t i = 0; i < set->size; i++) {
        if (strcmp(str, set->strings[i]) == 0) {
            // Set already contains the string, nothing left to do
            return;
        }
    }

    // The set doesn't already contain the string, so we add it, expanding the backing array if needed
    if (set->size == set->capacity) {
        set->strings = realloc(set->strings, set->capacity*2*sizeof(char*));
        set->capacity *= 2;
    }

    set->strings[set->size] = str;
    set->size++;
}

void stringset_destroy(struct stringset* set) {
    for (size_t i = 0; i < set->size; i++) {
        free(set->strings[i]);
    }
    free(set->strings);
}

// END SET

// BEGIN TRIE

#define NUM_ALLOWED_CHARS 26

struct node {
    unsigned int data;
    struct node* children[NUM_ALLOWED_CHARS];
};

void trie_add(struct node* root_node, char* str) {
    struct node* curr_node = root_node;

    // Walk down the trie, using the current char within the string to select the child at each node
    for (size_t i = 0; i < strlen(str); i++) {
        size_t index = str[i] - 'a';

        // Create the next child node if it doesn't exist yet
        if (curr_node->children[index] == NULL) {
            struct node *new_node = malloc(sizeof(struct node));
            new_node->data = 0;
            memset(new_node->children, 0, NUM_ALLOWED_CHARS * sizeof(struct node *));
            curr_node->children[index] = new_node;
        }

        // Go 1 step further down the trie
        curr_node = curr_node->children[index];
    }

    // At this point, we are at the node targeted by the input string
    curr_node->data++;
}

unsigned int trie_query(struct node* root_node, char* str) {
    struct node* curr_node = root_node;

    // Use the same strategy as trie_add, but immediately stop if we encounter a child node that doesn't exist yet
    for (size_t i = 0; i < strlen(str); i++) {
        size_t index = str[i] - 'a';

        if (curr_node->children[index] == NULL) {
            // This node was never added, so the password must appear 0 times
            return 0;
        } else {
            // Go 1 step further down the trie
            curr_node = curr_node->children[index];
        }
    }

    // At this point, we are at the node targeted by the input string
    return curr_node->data;
}

// END TRIE

#define MAX_PASSWORD_LENGTH 10

struct stringset get_substrings(char* str) {
    struct stringset ss = stringset_init();

    for (size_t num_chars_to_copy = 1; num_chars_to_copy <= strlen(str); num_chars_to_copy++) {
        for (size_t str_index = 0; str_index < strlen(str) - num_chars_to_copy + 1; str_index++) {
            char* substr = calloc(num_chars_to_copy+1, sizeof(char));
            strncpy(substr, str+str_index, num_chars_to_copy);
            stringset_add(&ss, substr);
        }
    }

    return ss;
}

int main() {
    struct node root;
    root.data = 0;
    memset(root.children, 0, NUM_ALLOWED_CHARS*sizeof(struct node*));
    char password[MAX_PASSWORD_LENGTH+1];
    memset(password, 0, MAX_PASSWORD_LENGTH+1);

    unsigned int n_queries;
    scanf("%u", &n_queries);

    for (unsigned int i = 0; i < n_queries; i++) {
        unsigned int query_type;
        scanf("%u %s", &query_type, password);

        if (query_type == 1) {
            struct stringset substrings = get_substrings(password);
            for (size_t index = 0; index < substrings.size; index++) {
                trie_add(&root, substrings.strings[index]);
            }
            stringset_destroy(&substrings);
        } else {
            unsigned int n_accounts = trie_query(&root, password);
            printf("%u\n", n_accounts);
        }

        memset(password, 0, MAX_PASSWORD_LENGTH+1);
    }

    return 0;
}

 

 

Explanation:

Spoiler

My solution is centered around a modified trie, which provides quick insertion and access. When adding a password, I first split it into all possible substrings and add them to a set so that it only contains unique substrings. For the input password "aaa", the set will only contain the substrings "a", "aa", and "aaa".

 

Once we have all unique substrings, we add them to a trie where each node contains the number of times its associated substring appeared as well as an array of 26 pointers to child nodes. These child pointers are only populated when needed, during an insertion operation.

 

To add the substring "abc" to the trie, we increment the value of the node at root->a->b->c.

To query the trie with substring "xy", we return the value at node root->x->y.

 

I'm aware there are MANY optimizations possible here, but it ran within the time and memory limits so I stopped there.

 

And yes, I know you should check the return values of scanf and malloc-and-friends, and use fgets so that user input can't cause a buffer overflow. Oh well 🙃

 

Now looking at your code, a hash table is a valid way of solving this problem. And yes, it looks like the original problem was with generating the unique substrings.

Computer engineering PhD student, machine learning researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 32GB DDR4 3200MHz C16 | OS: Debian 12

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16 | OS: Windows 11

Link to comment
https://linustechtips.com/topic/1615170-algo-problem-in-c/#findComment-16766456
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×