Jump to content

Calling All Scam Baiters! Give me your worst!

Hello Fellow Scam Baiters!

 

Okay, so that title is a little bit confusing. Here's the thing. I need a lot of sample data. I'm trying to train an Open AI GPT-2 model to try generate natural language texts and try to waste more time. Training these models takes a lot of data. Like, a lot of data. Like tens or hundreds of gigabytes of plaintext data. A lot of data. But, to get started, what better place to get started collecting said data than the Linus Tech Tips forum!

 

SO, here's what I need from you. You can post as many replies as you want to this topic. For each correspondence (scammer contacts you, you respond), just copy what they said and how you responded. Drop as many replies as you want!

 

For context on how your data will be used: I'll be manually moving data out of the forum onto a private database. One training point will be one correspondence, rather than an entire conversation. Think about it this way: when the scammer tries to get contact information, a lot of scam baiters will respond in a similar way. The machine model will try to pick up on some pattern there: ask for contact information means that I should respond by sending a fake name. So, if you want to make my job a little bit easier, only drop one correspondence in one reply.

We do not anticipate any risks associated with collection of this information. Your participation is completely voluntary, and there are no penalties if you choose not to participate or skip any correspondences. By participating, you are providing important, real-life information that can be used to train the next generation of neural networks for natural language processing.

You can remove your contact information and any email headers, as well as reply or forwarding headers. If you don't, they'll be removed in the data cleansing process. The datasets generated will not be made public unless by valid request (i.e., you're a student at an accredited institution and you're doing research). If you have questions about this policy, please send a direct message.

"Not breaking it or making it worse is key."

"Bad choices make good stories."

Link to comment
Share on other sites

Link to post
Share on other sites

  • My system specs
  • View 91 Tempered Glass RGB Edition, No PSU, XL-ATX, Black, Full Tower Case
  • ROG MAXIMUS XI EXTREME, Intel Z390 Chipset, LGA 1151, HDMI, E-ATX Motherboard
  • Core™ i9-9900K 8-Core 3.6 - 5.0GHz Turbo, LGA 1151, 95W TDP, Processor
  • GeForce RTX™ 2080 Ti OC ROG-STRIX-RTX2080TI-O11G-GAMING, 1350 - 1665MHz, 11GB GDDR6, Graphics Card
  • ROG RYUJIN 360, 360mm Radiator, Liquid Cooling System
  • 32GB Kit (2 x 16GB) Trident Z DDR4 3200MHz, CL14, Silver-Red DIMM Memory
  • AX1600i Digital, 80 PLUS Titanium 1600W, Fanless Mode, Fully Modular, ATX Power Supply
  • Formula 7, 4g, 8.3 (W/m-K), Nano Diamond, Thermal Compound
  • On AIO cooler 6 x NF-F12 IPPC 3000 PWM 120x120x25mm 4Pin Fibre-glass SSO2 Heptaperf Retail
  • 6 x NF-A14 IPPC-3000 PWM 140mm, 3000 RPM, 158.5 CFM, 41.3 dBA, Cooling Fan
  • 1TB 970 PRO 2280, 3500 / 2700 MB/s, V-NAND 2-bit MLC, PCIe 3.0 x4 NVMe, M.2 SSD
  • Windows 10 Pro 64-bit 
  • Beyerdynamic MMX 300 (2nd Generation) Premium Gaming Headset
  • ROG PG279Q
  • Corsair K95 Platinum XT
  • ROG Sica
Link to comment
Share on other sites

Link to post
Share on other sites

Isn't KitBoga doing the exact same thing? 

🌲🌲🌲

 

 

 

◒ ◒ 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Arika S said:

Isn't KitBoga doing the exact same thing? 

Maybe this is he.

 

Illuminati confirmed?

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Thomas001 said:

This isn't about seeing how one person responds to a scammer - that I could do relatively easily or just fabricate the data myself. It's about seeing how a lot of different people respond to a scammer.

The computer's goal is to figure out the common traits between certain inputs and link them with certain outputs, something called a neural network. However, when making a model, we have to be careful to not over-fit the data. That means that when we give the computer an input, it shouldn't create something so specific that it turns out garbled. Think about it this way - if I trained a neural network on a bunch of LinusTechTips scripts, it would end up learning that humans say "speaking of ___, our sponsor: Pulseway" very, very frequently. In reality, humans don't say that; Linus specifically says that. But, if I trained the neural network on a bunch of tech YouTuber scripts, like those from LinusTechTips, Bitwit, JayzTwoCents, AND Barnacules, it would learn how to speak like a tech YouTuber, not just Linus. As another example, if I trained a neural network on the r/pcmasterrace subreddit, I'd get a very tailored neural network to what r/pcmasterrace says frequently and how they act. Any dialog that came out of that would be incredibly tech heavy. If I trained that same neural network on the r/roastme subreddit, any dialog that came out would be super condescending and generally act like a dick. But, if I trained that same neural network on the top 500 most active subreddits, it would learn to speak like the Internet, not like a person (complete with ROFLs and LMAOs). If you'd like to learn more, HMU; I'd be happy to elaborate!

In this case, I don't want to train a neural network to speak just like Jim Browning or KitBoga. That wouldn't be very useful or representative of how the general scam baiting community would respond to a scammer (plus, there's just not enough data between them, even combined).

 

7 hours ago, Arika S said:

Isn't KitBoga doing the exact same thing? 

Kind of, but not really. KitBoga's schtick is to just waste the scammer's time. While it's incredibly entertaining, it's not really what I'm after. I'm looking to actually train a machine learning model to mimic KitBoga, Jim Browning, and the thousands of other scam baiters out there to waste even more of the scammers' time. Now you might say, "well, hasn't that already been done with Sp@mnesty and/or Re:scam?" You're not wrong, but in addition to just training this neural network and creating the model, I'd ultimately like to release that research to the public, so that, (a) we can further machine learning and natural language processing research, (b) I can waste some more scammer time by deploying some of these bots on my own email server, and (c) anyone else who wants to get into scam baiting but doesn't know how to respond can utilize the tool.

3 hours ago, Curious Pineapple said:

Maybe this is he.

 

Illuminati confirmed?

Guess you'll never know...

Spoiler: no, I'm not KitBoga. 

"Not breaking it or making it worse is key."

"Bad choices make good stories."

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×