Jump to content

Hi all, 

 

I am working on a platform that allows users to be created and a question was asked. "What happens is someone wants to signup with a curse word"

 

And it got me thinking, what is the best way to filter these words to make sure they do not show up in peoples usernames?

Am I going to have to write a functions to search for these words and create the list myself..

 

I don't mind having some words in the source code but others.. might not be great to have in the source code haha

 

Please let me know your thoughts.

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/
Share on other sites

Link to post
Share on other sites

13 minutes ago, DigitalGoat said:

Why not use a resource file containing a list of words your program can check against, maybe a database style that you could edit separately to add more words as you become aware of them.

I was thinking the database route more so myself as well. Just realized we will need to filter words in other languages as well haha

 

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/#findComment-13851494
Share on other sites

Link to post
Share on other sites

35 minutes ago, ScottDurkin said:

And it got me thinking, what is the best way to filter these words to make sure they do not show up in peoples usernames?

Make a file that features 'all the curse words' and then check the username against the file.

So the words aren't written directly in the source code, but in a file that is in the code.

You could also make a database table with these names, but the issue with that could be losing connection to the DB, meaning the names are now allowed.

 

But you're gonna have to put in some more effort in it too. You need to 'translate' some of the letters too.

For example, I could use the letter 'i', but I could also use the numbers '1', or '|', or a small L ('l'), etc. Same for zero (0) and O, or 3 and E, etc.. So you might need to write a translator for those.

So if you want to block the word "sample", you also need to block "s4mple", "sampl3" and "s4mpl3" (or even with s = 5, etc.).

 

Before you check if the word is allowed, you would need to 'translate' usernames too.

To give a bit of inspiration, here is a list of nicknames Pokemon disallows players to use (or at least disallows to trade online), obviously 'viewer discretion advised', since this page contains (many) 'bad words' :P https://bulbapedia.bulbagarden.net/wiki/List_of_censored_words_in_Generation_V

But ass you can see, they for example blocked the word "damn", but not "d4mn". Whether you want to block the latter is up to you, but if you do, it's probably best to build in a sort of 'translator'.

 

Then again, by blocking certain words, you do run into the issue of blocking potentially harmless names too.

There is a Pokemon called 'Cofagrigus', who you were not allowed to trade online, unless you changed its name. The name comes from coffin ('Cofa') and egregious ('grigus'). Unfortunately in the middle of the word, it features the informal derogatory word for 'homosexual person'. That means they later had to build in an exception for this Pokemon, but that of course brings some more difficulties too..

 

Or you can just download an already made plugin for your programming language, which probably exists.

 

EDIT: oh and to say something more about that 'Cofagrigus' story, you also have to set boundaries to the filtering.

For example, you might very well want to ban all use of Nazi words and such, which is very understandable (to put it lightly). Well one of the 'terms' they used was 'SS'. But banning the use of those two characters next to each is going to be very hard, as you will also be banning the use of any words which feature those two letters (lass, kiss, stress, miss, etc.). So you would need and extra filter for that.. And then another filter to make sure they are not using your filter.. To go around the censorship..

 

As you might have been able to read, censorship of usernames is a cat and mouse games. For every filter you make, there are multiple exceptions that need to be made and exceptions to these exceptions.

It's best to just focus on the most egregious words featured in the language(s) of your users.

 

Plus, in the Unicode standard, there are currently more than 140K different characters. That's your basic abc's, numbers, special characters, but also Hebrew, Arabic, Japanese, etc. and other 'weird symbols' (think stuff like the symbols for spades, hearts or arrows, etc.)

It's going to be next to impossible to for example find every character that somewhat resembles an i or an e. So if people want, they can always find a way around the name banning (unless you disallow another other than ABC, 123, !@# for example, which might piss off people who use different alphabets).

"We're all in this together, might as well be friends" Tom, Toonami.

 

mini eLiXiVy: my open source 65% mechanical PCB, a build log, PCB anatomy and discussing open source licenses: https://linustechtips.com/topic/1366493-elixivy-a-65-mechanical-keyboard-build-log-pcb-anatomy-and-how-i-open-sourced-this-project/

 

mini_cardboard: a 4% keyboard build log and how keyboards workhttps://linustechtips.com/topic/1328547-mini_cardboard-a-4-keyboard-build-log-and-how-keyboards-work/

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/#findComment-13851498
Share on other sites

Link to post
Share on other sites

Also if you need a database to get you started, check here.

https://github.com/LDNOOBW

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18! jellYfIn Client siDE TRanscoDinG

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/#findComment-13851502
Share on other sites

Link to post
Share on other sites

23 minutes ago, minibois said:

Make a file that features 'all the curse words' and then check the username against the file.

So the words aren't written directly in the source code, but in a file that is in the code.

You could also make a database table with these names, but the issue with that could be losing connection to the DB, meaning the names are now allowed.

 

But you're gonna have to put in some more effort in it too. You need to 'translate' some of the letters too.

For example, I could use the letter 'i', but I could also use the numbers '1', or '|', or a small L ('l'), etc. Same for zero (0) and O, or 3 and E, etc.. So you might need to write a translator for those.

So if you want to block the word "sample", you also need to block "s4mple", "sampl3" and "s4mpl3" (or even with s = 5, etc.).

 

Before you check if the word is allowed, you would need to 'translate' usernames too.

To give a bit of inspiration, here is a list of nicknames Pokemon disallows players to use (or at least disallows to trade online), obviously 'viewer discretion advised', since this page contains (many) 'bad words' :P https://bulbapedia.bulbagarden.net/wiki/List_of_censored_words_in_Generation_V

But ass you can see, they for example blocked the word "damn", but not "d4mn". Whether you want to block the latter is up to you, but if you do, it's probably best to build in a sort of 'translator'.

 

Then again, by blocking certain words, you do run into the issue of blocking potentially harmless names too.

There is a Pokemon called 'Cofagrigus', who you were not allowed to trade online, unless you changed its name. The name comes from coffin ('Cofa') and egregious ('grigus'). Unfortunately in the middle of the word, it features the informal derogatory word for 'homosexual person'. That means they later had to build in an exception for this Pokemon, but that of course brings some more difficulties too..

 

Or you can just download an already made plugin for your programming language, which probably exists.

Thanks for the insight on everything and suggestions :)

 

As for the harmless names. We plan on a whitelist for certain names and places in the world as well. It will need some refining but I think it will work :)

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/#findComment-13851540
Share on other sites

Link to post
Share on other sites

You could do some sort of a tree search thing. 

 

Individually search the text for letters that curse words start with. Then if the subsequent letter is a second letter for that work, keep searching. Otherwise skip. 

 

IE if index is F

if second letter is u

 

so on. 

But if someone does F then H make sure it doesn’t also flag that cuz F hit is not a curse word. 

 

But i think that digital goat is the smartest. 

 

Maybe load a text file as an array of strings and do a for loop checking if

string.contains(curses[index])

Link to comment
https://linustechtips.com/topic/1225787-curse-word-filtering/#findComment-13858922
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×