Jump to content

Algorithm searching in strings with wildcards (*,?) in C

Go to solution Solved by rednas16,

this is the bug free code

//0 = no match//1 = match//When pattern is "" will return a matchint matchesPattern( char* input, char* pattern) {	int i, z;	if(pattern[0] == '\0')		return 1; 	for(i = 0; pattern[i] != '\0'; i++) 	{		if(pattern[i] == '\0') 		{			return 0; //Pattern has ended but still some left, meaning no match		}		else if(pattern[i] == '?')		{			continue; //? replaces a character, no need to check if there is a match		}		else if(pattern[i] == '*') 		{			//Pattern has matched up until now, but now need to check			//the rest of the pattern  			//Checking for the match of pattern starting after the *			//If the match isn't found, shift input by 1 and check again			//If no match is found then pattern doesn't exist			for(z = i; input[z] != '\0'; z++) 			{				if(matchesPattern(input + z, pattern + i + 1) == 1) 				{					//Pattern found, return true					return 1;				}			}			//If this line was hit the pattern after * couldn't be found anywhere			return 0;		}		else if(pattern[i] != input[i])		{			return 0; //Didn't match input		}		//Characters matched, keep continuing	}	//To hit here the current pattern must contain no * and must have all characters matching	return 1;}
 
hello i'm from belgium, sorry for the not that great english
 
does anyone know an algorithm that can search in strings including with wildcards like "?", "*" and combinations like "test?test*"
 
i'm programming in c and use char* strings
 
thx a lot 
rednas16
 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Yes, there are several ways to do that, some are much better than others.

 

It depends on what you want to achieve. Do you want to implement this, or do you simply need something that does this? If you just need something that does this you can use regular expressions (POSIX library, for example).

If you're searching for pathnames that match it you could use glob.

 

If you want to implement it efficiently, I can point you in the right direction, but you'll have to learn quite a bit of stuff. If not, you can just iterate through the string and check for the patterns repeatedly (this isn't very efficient but it's very easy to implement).

Want to solve problems? Check this out.

Link to comment
Share on other sites

Link to post
Share on other sites

i need to implement it, just a basic program like : enter a couple of words, give the term what you are looking for (including the wordcards) and print again the words which match with the term ..

do you maybe also have some code from that iteration?
 

Yes, there are several ways to do that, some are much better than others.

 

It depends on what you want to achieve. Do you want to implement this, or do you simply need something that does this? If you just need something that does this you can use regular expressions (POSIX library, for example).

If you're searching for pathnames that match it you could use glob.

 

If you want to implement it efficiently, I can point you in the right direction, but you'll have to learn quite a bit of stuff. If not, you can just iterate through the string and check for the patterns repeatedly (this isn't very efficient but it's very easy to implement).

Link to comment
Share on other sites

Link to post
Share on other sites

It all will depend on the behaviour you want when using the wildcards.  Some wild card searches might be like so

Type 1:

test123* (valid)

test*123 (valid)

test??? (valid)

test???test (valid)
*test (valid)

 

Type 2:

test123* (valid)

test*123 (invalid)

test??? (valid)

test???test (invalid)
*test (invalid)

 

In terms of code wolfsinner' links were a good suggestion.

 

For implementation of an algorithm yourself, personally I would do something like this to start (not optimized, horribly recursive, and hasn't even been compiled to check for errors but the pattern should be easy enough to follow and understand)

//0 = no match//1 = match//When pattern is "" will return a matchint matchesPattern(const char* input, const char* pattern) {	int i, z;	if(pattern[0] == '\0')		return 1;	for(i = 0; input[i] != '\0'; i++) {		if(pattern[i] == '\0') {			return 0; //Pattern has ended but still some left, meaning no match		}		else if(pattern[i] == '?') {			continue; //? replaces a character, no need to check if there is a match		}		else if(pattern[i] == '*') {			//Pattern has matched up until now, but now need to check			//the rest of the pattern 			//Checking for the match of pattern starting after the *			//If the match isn't found, shift input by 1 and check again			//If no match is found then pattern doesn't exist			for(z = i; input[z] != '\0'; z++) {				if(matchesPattern(input + z, pattern + i + 1) == 1) {					//Pattern found, return true					return 1;				}			}			//If this line was hit the pattern after * couldn't be found anywhere			return 0;		}		else if(pattern[i] != input[i])			return 0; //Didn't match input		}		//Characters matched, keep continuing	}	//To hit here the current pattern must contain no * and must have all characters matching	return 1;}

Doubt this thing would compile, but it is a good pseudo code at least for an algorithm to check for Type 1 (I hope at least :p )

0b10111010 10101101 11110000 00001101

Link to comment
Share on other sites

Link to post
Share on other sites

i need to implement it, just a basic program like : enter a couple of words, give the term what you are looking for (including the wordcards) and print again the words which match with the term ..

do you maybe also have some code from that iteration?

 

 

I wouldn't mind giving you some code in most circumstances, but I smell homework/assignment. I really believe that in those cases you should make an effort to implement your own solution.

 

The general idea is simple: You iterate through both the string and the pattern characters. If the pattern has anything other than ? or *, you test it against the string. If it has a ?, you test for whatever that wildcard matches (it usually serves as a placeholder for any character). If it has a *, you test for whatever that wildcard matches (it usually matches a string of 0 or more characters, followed by whatever is after it).

The only wildcard that could pose some kind of challenge is the asterisk. You need to make sure that you test whatever is after it entirely, and if something fails you have to completely go back on the pattern and repeat it for the following character.

I'd recommend you pre-process the pattern string to remove repeated * (such as in a***b -> a*b ) if the the asterisk represents 0 or more characters. In the case that it represents 1 or more characters, you would pre-process it as a***b -> a??*b.

 

This is an explanation for a non-recursive solution. If you're comfortable with recursion, you could always use it to write a prettier (albeit less efficient) solution, similar to what WanderingFool wrote above.

 

Although, keep in mind that this is not an efficient approach in general (both the iterative and recursive versions of this solution).

Want to solve problems? Check this out.

Link to comment
Share on other sites

Link to post
Share on other sites

i can't deny that it is for my second chance project

 

yesterday, after your first comment i thought well about the problem an concluded the same general idea like you said . and i tried to generate my own code but failed :(

so, it would be great if you give your code but i'll understand if you don't.. i'm just kinda out of ideas

 

thx anyway
 

I wouldn't mind giving you some code in most circumstances, but I smell homework/assignment. I really believe that in those cases you should make an effort to implement your own solution.

 

The general idea is simple: You iterate through both the string and the pattern characters. If the pattern has anything other than ? or *, you test it against the string. If it has a ?, you test for whatever that wildcard matches (it usually serves as a placeholder for any character). If it has a *, you test for whatever that wildcard matches (it usually matches a string of 0 or more characters, followed by whatever is after it).

The only wildcard that could pose some kind of challenge is the asterisk. You need to make sure that you test whatever is after it entirely, and if something fails you have to completely go back on the pattern and repeat it for the following character.

I'd recommend you pre-process the pattern string to remove repeated * (such as in a***b -> a*b ) if the the asterisk represents 0 or more characters. In the case that it represents 1 or more characters, you would pre-process it as a***b -> a??*b.

 

This is an explanation for a non-recursive solution. If you're comfortable with recursion, you could always use it to write a prettier (albeit less efficient) solution, similar to what WanderingFool wrote above.

 

Although, keep in mind that this is not an efficient approach in general (both the iterative and recursive versions of this solution).

Link to comment
Share on other sites

Link to post
Share on other sites



It all will depend on the behaviour you want when using the wildcards. Some wild card searches might be like so
Type 1:
test123* (valid)
test*123 (valid)
test??? (valid)
test???test (valid)
*test (valid)

Type 2:
test123* (valid)
test*123 (invalid)
test??? (valid)
test???test (invalid)
*test (invalid)

In terms of code wolfsinner' links were a good suggestion.

For implementation of an algorithm yourself, personally I would do something like this to start (not optimized, horribly recursive, and hasn't even been compiled to check for errors but the pattern should be easy enough to follow and understand)
//0 = no match//1 = match//When pattern is "" will return a matchint matchesPattern(const char* input, const char* pattern) { int i, z; if(pattern[0] == '\0') return 1; for(i = 0; input[i] != '\0'; i++) { if(pattern[i] == '\0') { return 0; //Pattern has ended but still some left, meaning no match } else if(pattern[i] == '?') { continue; //? replaces a character, no need to check if there is a match } else if(pattern[i] == '*') { //Pattern has matched up until now, but now need to check //the rest of the pattern //Checking for the match of pattern starting after the * //If the match isn't found, shift input by 1 and check again //If no match is found then pattern doesn't exist for(z = i; input[z] != '\0'; z++) { if(matchesPattern(input + z, pattern + i + 1) == 1) { //Pattern found, return true return 1; } } //If this line was hit the pattern after * couldn't be found anywhere return 0; } else if(pattern[i] != input[i]) return 0; //Didn't match input } //Characters matched, keep continuing } //To hit here the current pattern must contain no * and must have all characters matching return 1;}
Doubt this thing would compile, but it is a good pseudo code at least for an algorithm to check for Type 1 (I hope at least :P )



i compiled it, you forgot one '{' after the last "else if"

everting worked fine, except:

when i filled in: "test" and "t?st" it gave true, no problem with that so far

but when i filled in : "tst" and "t?st" it gave also true, but isn't it supposed to give false.. cause "?" means that there is something between the 't' and the 's' .. or is that a misunderstanding of me and "?"

thx anyway !
Link to comment
Share on other sites

Link to post
Share on other sites

this is the bug free code

//0 = no match//1 = match//When pattern is "" will return a matchint matchesPattern( char* input, char* pattern) {	int i, z;	if(pattern[0] == '\0')		return 1; 	for(i = 0; pattern[i] != '\0'; i++) 	{		if(pattern[i] == '\0') 		{			return 0; //Pattern has ended but still some left, meaning no match		}		else if(pattern[i] == '?')		{			continue; //? replaces a character, no need to check if there is a match		}		else if(pattern[i] == '*') 		{			//Pattern has matched up until now, but now need to check			//the rest of the pattern  			//Checking for the match of pattern starting after the *			//If the match isn't found, shift input by 1 and check again			//If no match is found then pattern doesn't exist			for(z = i; input[z] != '\0'; z++) 			{				if(matchesPattern(input + z, pattern + i + 1) == 1) 				{					//Pattern found, return true					return 1;				}			}			//If this line was hit the pattern after * couldn't be found anywhere			return 0;		}		else if(pattern[i] != input[i])		{			return 0; //Didn't match input		}		//Characters matched, keep continuing	}	//To hit here the current pattern must contain no * and must have all characters matching	return 1;}
Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×