Look, we can debate the proper and private way to do Captchas all day, but if we remove the existing implementation we will be plunged into a world of hurt.
I run tucson.social - a tiny instance with barely any users and I find myself really ticked off at other Admin’s abdication of duty when it comes to engaging with the developers.
For all the Fediverse discussion on this, where are the github issue comments? Where is our attempt to convince the devs in this.
No, seriously WHERE ARE THEY?
Oh, you think that just because an “Issue” exists to bring back Captchas is the best you can do?
NO it is not the best we can do, we need to be applying some pressure to the developers here and that requires EVERYONE to do their part.
The Devs can’t make Lemmy an awesome place for us if us admins refuse to meaningfully engage with the project and provide feedback on crucial things like this.
So are you an admin? If so, we need more comments here: https://github.com/LemmyNet/lemmy/issues/3200
We need to make it VERY clear that Captcha is required before v0.18’s release. Not after when we’ll all be scrambling…
EDIT: To be clear I’m talking to all instance admins, not just Beehaw’s.
UPDATE: Our voices were heard! https://github.com/LemmyNet/lemmy/issues/3200#issuecomment-1600505757
The important part was that this was a decision to re-implement the old (if imperfect) solution in time for the upcoming release. mCaptcha and better techs are indeed the better solution, but at least we won’t make ourselves more vulnerable at this critical juncture.
Just created the instance, now the spammers consumed all my email allowances today :( I just enabled CAPTCHA now, now I’m gonna wait till tomorrow how much this makes difference.
Glad to know I was here and did my part by reading this post. We couldn’t have succeeded without me!🫡
There are other options.
I’m just a hobbyist, but I have built a couple websites with a few hundred users.
A stupidly simple and effective option I’ve been using for several years now, is adding a dummy field to the application form. If you add an address field, and hide it with CSS, users won’t see it and leave it blank. Bots on the other hand will see it and fill it in, because they always fill in everything. So any application that has an address can be automatically dropped. Or at least set aside for manual review.
I don’t know how long such a simple trick will work on larger sites. But other options are possible.
Couldn’t the bots just be programmed to not fill out that field? Or not fill out any field flagged as hidden?
Yes, but it would take more work specific to this problem, which if it’s not a widespread technique would be viewed as impractical.
You’d think so.
But it’s not flagged as hidden. Instead you use CSS to set display as none. So the bot needs to do more than look at the direct HTML. It needs to fully analyze all the linked HTML, CSS, and even JavaScript files. Basically it needs to be as complex as a whole browser. It can’t be a simple script anymore. It becomes impracticality complicated for the not maker.
This might work against very generic bots, but it won’t work against specialized bots. Those wouldn’t even need to parse the DOM, just recreate the HTTP requests.
Which is why you’d need something else for popular sites worth targeting directly. But there are more options than standard capta’s. Replacing them isn’t necessarily a bad idea.
This is what I’m worried about. As the fediverse grows and gains popularity it will undoubtedly become worth targeting. It’s not hard to imagine it becoming a lucrative target for things like astroturfing, vote brigading etc bots. For centralized sites it’s not hard to come up with some solutions to at least minimize the problem. But when everyone can just spin up a Lemmy, Kbin, etc instance it becomes a much, much harder problem to tackle because instances can also be ran by bot farms themselves, where they have complete control over the backend and frontend as well. That’s a pretty scary scenario which I’m not sure can be “fixed”. Maybe something can be done on the ActivityPub side, I don’t know.
That’s where simple defederation happens. It’s mostly why behaww cut off lemmy.world.
What if you have 100s or 1000s of such instances? At some point you defeat the entire purpose of the federation.
When you automate a browser process like signing up, you very likely manually set in your code the fields you want to fill, not sure why a bot would do that automatically… I don’t think this would be effective at all
Fun fact, I purposefully goaded the bots into attacking my instance.
Turns out they aren’t even using the web form, they’re going straight to the register api endpoint with python. The api endpoint lives at a different place from the signup page and putting a captcha in front of that page was useless in stopping the bots. Now, we can’t just challenge requests going to the API endpoint since it’s not an interactive session - it would break registration for normal users as well.
The in-built captcha was part of the API form in a way that prevented this attack where the standard Cloudflare rules are either too weak (providing no protection) or too strong (breaking functionality).
In my case I had to create some special rules to exclude python clients and other bots while making sure to keep valid browser attempts working. It was kind of a pain, actually. There’s a lot of Lemmy that seems to trip the optional OWASP managed rules so there’s a lot of “artisanally crafted” exclusions to keep the site functional.
Anyways, I guess my point is form interaction is just one way to spam sites, but this particular attacker is using the backend API and forgoing the sign-up page entirely. Hidden fields wouldn’t be useful here, IMO.
Nutomic has said they’re open to restoring captchas, but it will require a fair amount of work to bring the 0.17 implementation into 0.18, which the currently don’t have the bandwidth to implement.
They’ve also said they’re open to PR’s, so if someone really wants this feature they can open a PR for inclusion in the 0.18 release
NO it is not the best we can do, we need to be applying some pressure to the developers here and that requires EVERYONE to do their part.
I sure hope you’re supporting them financially considering the demands you’re making that require their time and labor.
Someone has already submitted a PR with the changes the dev recommended. The captcha stuff is in a new db table instead of in-memory at the websocket server.
However, from one of the devs:
One note, is that captchas (and all signup blocking methods) being optional, it still won’t prevent people from creating bot-only instances. The only effective way being to block them, or switch to allow-only federation.
Once people discover the lemmy-bots that have been made that can bypass the previous captcha method, it also won’t help (unless a new captcha method like the suggested ones above are implemented).
The root of the issue seems to be that they’ve removed websockets, for the following reasons:
Huge burden to maintain, both on the server and in lemmy-ui. Possible memory leaks. Not scalable.
I can understand them wanting to make their lives a bit easier (see "huge burden to maintain) - Lemmy has exploded recently (see “not scalable”) and there are far bigger issues to fix, and an even larger number of bad actors (see “possible memory leaks”) who have learned about Lemmy at the same time as everyone else and want to exploit or break it.
Hunh.
I just had a surge of user registrations on my instance.
All passed the captcha. All passed the email validation.
All, had a valid-sounding response.
I am curious to know if they are actual users, or… if I just became the host of a spam instance. :-/
Doesn’t appear to be an easy way to determine.
Hmmm, I’d check the following:
- Do the emails follow a pattern? (randouser####@commondomain.com)
- Did the emails actually validate, or do you just not see bouncebacks? There is a DB field for this that admins can query (i’ll dig it up after I make this high level post)
- Did the surge come from the same IP? Multiple? Did it use something that doesn’t look like a browser?
- Did the surge traffic hit /signup or did it hit /api/v3/register exclusively?
With those answers I should be able to tell if it’s the same or similar attacker getting more sophisticated.
Some patterns I noticed in the attacks I’ve received:
- it’s exactly 9 attempts every 30 minutes from the user agent “python/requests”
- The users that did not get an email bounceback were still not authenticated hours later (maybe the attacker lucked out with a real email that didn’t bounce back?). There was no effort to verify from what I could determine.
Some vulnerabilities I know that can be exploited and would expect to see next:
- ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.
- Duplicate Email conflicts can be bypassed by using a “+category” in your email. ie (someuser+lemmy@somedomain.com) This would allow someone to associate potentially hundreds of spam accounts with a single email.
- Different providers, no pattern. Some gmail. some other.
- Not sure
- Also- not sure.
- Not sure of that either!
But, here is the interesting part- Other than a few people I have personally invited, I don’t think anyone else has ever requested to join.
Then, out of the blue, boom, a ton of requests. And- then, nothing followed after.
The responses, sounded human enough. spez bad, reddit sinking, etc.
But, the traffic itself, didn’t follow… what I would expect from social media spreading. /shrugs.
Curious if you got a mention somewhere on reddit. It used to happen to our novelty sub whenever a thread blew up and suddenly thousands of eyes were on a single comment with the subreddit link.
That is my theory too. But, I have been unable to confirm, nor deny where the traffic originates.
Huh, that is interesting, yeah, that pattern is very anomalous. If you have DB access you can try to run this query to return all un-verified users and see if you can identify if the email activations are being completed:
SELECT p.id, p.name, l.email FROM person AS p LEFT JOIN local_user AS l ON p.id=l.person_id WHERE p.local=true AND p.banned=false AND l.email_verified='f'
Only 7 accounts still pending, 2 of which, are unrelated to the above flood.
The email address are left out for privacy- however, they are EXTREMELY normal sounding email addresses.
Based on the provided emails, usernames, and request messages- i’d say, it certainly looks like legit users.
Just- very odd of the timing.
5 huh? That’s actually noteable. So far I haven’t seen a real human user take longer than a couple of hours to validate. Human registrations on my instance seem to have a 30% attrition. That is, of 10 real human users, I can reasonably expect that 3 won’t complete the flow. It seems like your case might be nearing 40-50% which isn’t unheard of but couple this with the quickness that these accounts were created - I think you are looking at bots.
The kicker is, though, if one of them IS a real user, it’s going to be almost impossible to find out.
This is indeed getting more sophisticated.
I wish I could see this time period on a cloudflare security dashboard, I’m sure there could be a few more indicators there.
cloudflare security dashboard
Didn’t really see anything that stood out there either. A handful of users accessing via tor, but, thats about it.
Ended up turning the security policy from low, back up a bit though, forgot I turned it down while troubleshooting some federation issues.
Oh! I just remembered something. Isn’t there a site that recommends a lemmy instance? Might it make sense that multiple users found your website because they change the recommendation to distribute new users to smaller instances (hourly perhaps)? Does that sort of pattern hold in this case?
ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.
A simple deterrent for this could be to “hide” some information in the rules and request that information in the registration form. Not only are you ensuring that your users have at least skimmed the rules, you’re also raising the bar of difficulty for spammers using LLMs to generate human-sounding applications for your instance. Granted it’s only a minor deterrent, this does nothing if the adversary is highly motivated, but then again the same can be said of a lot of anti-spammer solutions. :)
ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.
I think it would be interesting if we could find a prompt that doesn’t work well with LLMs. Originally they struggled with math for example, but I wonder if it’d be possible to make a math problem that’s simple enough for most humans to solve but which trips up LLMs into outputting garbage.
Duplicate Email conflicts can be bypassed by using a “+category” in your email.
I personally use this to track who send my email address to where, since people usually don’t strip this from the address. It’s definitely abusable, but also has legitimate uses.
When it comes to LLMs we could use questions which they refuse to answer.
Obviously ‘How to build a pipe bomb’ is out of the question, but something like ‘What’s your favorite weapon of mass destruction?’, or ‘If you’d need to hide a body, how would you do it?’ might be viable
Not so sure on the LLM front, GPT4+Wolfram+Bing plugins seems to be a doozy of a combo. If anything there should be perhaps a couple interactable elements on the screen that need to be interacted with in a dynamic order that’s newly generated for each signup. Like perhaps “Select the bubble closest to the bottom of the page before clicking submit” on one signup and “Check the box that’s the furthest to the right before clicking submit”?
Just spitballin it there.
As for the category on email address - certainly not suggesting they remove supporting it, buuuuutttt if we’re all about making sure 1 user = 1 email address, then perhaps we should make the duplication check a bit more robust to account for these types of emails. After all someuser+lemmy@somedomain.com is the same as someuser@somedomain.com but the validation doesn’t see that. Maybe it should?
I think what you can do is take a small subset of users that have registered in your instance and observe their behavior. If you’ve noticed a lot of them are acting in bad faith and in bad behavior then its likely that a lot of the user registrations in your instance are bots. How active are the users in your instance in terms of posting and in commenting?
Been keeping an eye- I don’t think any of them are actually even active. At least, in the sense I don’t see any posts/comments.
I mean for now it seems okay, I took the liberty to check out your instance to check it out and it seems to be okay imo too but still keep an eye out of bad actors
My current assumption- based on the data I dug up, it appears to be legit traffic originating from reddit.
I just don’t think the users realize their account was approved… perhaps. /shrugs.
Unexpected wave of traffic I suppose.
Possible people who dont get approved immediately move on to amother server and settle in.
We need to make it VERY clear that Captcha is required before v0.18’s release. Not after when we’ll all be scrambling…
You would honestly be surprised. Captcha isn’t nearly as effective at stopping spam. It only stops the lowest hanging fruit.
Most of the “spambot” developers, started using AI-based tools a while back.
It only helps stopping the lowest-hanging of fruit.
Also, due to the way federation and all works… well, just remember, there are a million ways for spammers to get access currently…
While I am glad that they listened to the community. Everyone seemed to forget that you do not have to upgrade to the latest version. If the risks outweigh the benefits it’s perfectly fine to stay at the last working version. There is also the possibility of backports (forward ports in this case?) and manual restoration of Captchas to get the features you want out of the later versions if you have the development skillset to do so.
There’s nothing stopping instance owners from incorporating their own security measures into their infrastructure as they see fit, such as a reverse proxy with a modern web application firewall, solutions such as Cloudflare and the free captcha capabilities they offer, or a combination of those and/or various other protective measures. If you’re hosting your own Lemmy instance and exposing it to the public, and you don’t understand what would be involved in the above examples or have no idea where to start, then you probably shouldn’t be hosting a public Lemmy instance in the first place.
It’s generally not a good idea to rely primarily on security to be baked into application code and call it a day. I’m not up to date on this news and all of the nuances yet, I’ll look into it after I’ve posted this, but what I said above holds true regardless.
The responsibility of security of any publicly hosted web application or service rests squarely on the owner of the instance. It’s up to you to secure your infrastructure, and there are very good and accepted best practice ways of doing that outside of application code. Something like losing baked in captcha in a web application should come as no big deal to those who have the appropriate level of knowledge to responsibly host their instance.
From what this seems to be about, it seems like a non-issue, unless you’re someone who is relying on baked in security to cover for your lack of expertise in properly securing your instance and mitigating exploitation by bots yourself.
I’m not trying to demean anyone or sound holier than thou, but honestly, please don’t rely on the devs for all of your security needs. There are ways to keep your instance secure that doesn’t require their involvement, and that are best practice anyways. Please seek to educate yourself if this applies to you, and shore up the security of your own instances by way of the surrounding infrastructure.
I’m surprised some large instances aren’t using Cloudflare. It takes a few minutes to setup and the added benefit of caching alone is worth it. Let alone the bot/ddos protection.
I know right? The free tier would be enough to handle most anything and would take a tremendous load off of the origin server with proper Cache Rules in place. I can’t remember which instance it was, but one of the big ones started to use Cloudflare but then backtracked because of “problems”. When I saw that, I couldn’t help but think that they just didn’t know what they were doing.
You ALL have a responsibility to communicate back to lemmy devs to try to stop it.
No I don’t. Stop trying to brigade people to an issue. If you have an issue with it… Fork the lemmy UI code and make your own. Or stay on pre 0.18 code.
It’s one thing to bring awareness to the issue. It’s another to demand that I take action on something that’s not only a non-issue for me (and likely many other admins of instances) but that the devs don’t have to support. You’re not paying them… you’re not their mother. You don’t get to force them to do anything they don’t want to do.
Honestly the captchas that lemmy uses are terrible anyway. https://addons.mozilla.org/en-US/firefox/addon/2captcha-solver/ You can even solve them yourself as a browser extension… There’s no point to them in today’s world.
Exactly, instance admins that want to keep CAPTCHA have two good options here:
- Stay on 0.17.x until 0.18.y drops that re-implements CAPTCHA satisfactorily
- Fork and modify lemmy to version 0.18-captcha, undo the commit removing the old Captcha code.
I totally get the project maintainers are stubborn but no one has a “responsibility to stop the devs from doing it”. It reeks of open-source entitlement.
You won’t see me making call to action posts for undelivered features or other small-fry items. I’m a dev, I get it.
But there are always times were vulnerabilities come up and a dev might not otherwise know that it’s being exploited. It’s one thing to have a feature to fix that vulnerability and get to it as part of your own priority list. It’s another when that vulnerability is actively impacting the people using the software - that’s when getting vocal about an issue is appropriate to help me alter my priorities, IMO.
Your concerns about security of the application and community are valid. I get that this is essentially a vulnerability that should be mitigated and fixed. Raising awareness of it is fine.
Where I take issue, I suppose you didn’t entirely intend this, is that our responsibility is to put pressure on the main developers to fix the issue before the 0.18 release and dictate their priorities for them.
I would rather we discuss workarounds, mitigation steps in the interim, assist in solving the issues through Pull Requests and discussion on the issues page and forums. I just think it’s a bad idea to point blaming fingers at devs for being slow to respond, or badger them to make these changes, when they are volunteering their own time to share Lemmy with us (some also maintaining Jerboa and Lemmy UI at the same time)
With the way the licensing is, I would rather the project be forked by someone that would want to fix the issue. The repo maintainers are entitled to set their own priorities, just like Lemmy instance admins are allowed to determine how they run the server.
Thank you for the measured take on this.
You are correct, I don’t intend to pressure or cause harm! But I certainly see the results, and it is indeed pressure. As another commenter pointed out, there are many instance admins who work a bit closer to the team on the Matrix chatrooms and that’s their preferred method of communication. Now that I know this, I’ll let things cool down and join myself. I definitely intend to contribute where I can in the codebase, and I wouldn’t dream of escalating to public pressure for smaller concerns.
However, I have a slight, and perhaps pedantic disagreement about making changes. In this case, the request was for not making a change. If it weren’t for the fact that the feature was already ripped out it would be as simple as not removing it (or in this case re-working it a bit). I understand that it isn’t the current reality, and that it required work to revert - and if not for a ton of spambots, I think It would’ve been easier to adapt.
Ultimately it will take time to discuss workarounds and help others implement them, and the deadline is ultimately the arrival of the version that drops the older captcha (or was, in this case - it’s getting merged back in as we speak - might even be done now). With that reality, I had a sense that this could be an existential problem for the early Threadiverse.
I definitely didn’t intend to suggest that the Devs were in any way at fault here. I read the github issues enough to come with the takeaway was that the feedback they were receiving seemed to be “Admins and devs alike are okay moving forward and opinions to the contrary are minimal, let’s move forward”. It was definitely intended to be a way to communicate using raw numbers (but not harassment). I’d like to think I’m fairly pragmatic in that if it IS working for folks, then that is a contrary opinion, and that it was missing.
Where I definitely failed was my overly emotional messaging. It’s certainly not an excuse, but my recent autism diagnosis does at least help explain why I have an extremely strong sense of justice and can sometimes react in ways that are less than productive in some ways.
As for the licensing, I agree! I’m talking to some good friends of mine because I want to take my instance WAY further than most others - goal is a non-profit that answers to Tucsonans and residents of larger Pima county rather than someone not in the community. There’s just a lot of features this concept would need that it might diverge so much from the Lemmy vision that it needs to be something new - and hopefully a template for hyper-local social networks that can take on Nextdoor.
I can see better where our disagreement is, and I appreciate you being reasonable about it as well. Thank you for that.
Sounds like you have some great plans coming with your Tucson social project. All the best!
Why would devs remove something like this, at this time? Is it causing huge problems larger than the problems removing it would cause?
Makes me wonder if the devs are being paid to cripple Lemmy. This is where open source shines, we don’t have to be held hostage to one product/service.
It looks like they decided to bring it back in time for the next release! - https://github.com/LemmyNet/lemmy/issues/3200#issuecomment-1600505757
They specifically mentioned the feedback in the ticket and it goes to show how collective action can work.
Despite how others felt that I was trying to start a “brigade” - I was only trying to raise awareness by being collectively vocal. I never asked folks to abuse devs or “force” them to do something. I asked them to make their concerns known and let the devs choose. It’s just that when I posted there were far less comments, and if I were the developer I wouldn’t know that this issue is important to a lot of people - at least just looking at the github issues anyways.
Captchas pretty much worthless. They’re easily bypassed for basically free. You’re better off putting your instance behind Cloudflare with their captcha
Okay, so do you mind explaining why the servers onboarding the most spam users are the ones without Captchas?
If they are so ineffective, why are they effective now?
Invisible captchas are about as useful as graphical ones and are significantly less annoying to the end user
I’m mixed about this. When applied correctly, a graphical captcha will let zero bots in, at the expense of false positives and frustrated users. On the other hand, invisible / proof-of-work captchas will let a fraction of the bots in (blocking majority but not all bots, by design), while providing better experience for legitimate users. Pick your poison basically.
When applied correctly, a graphical captcha will let zero bots in
Absolutely untrue. There are services that will solve captchas for you for hundredths of a penny. It’s essentially free.
I don’t have a responsibility to do anything. This isn’t my issue and you can’t force me to care.
Caron? More like doesn’tcaron! Ha! Amirite!?
I can’t tell if this is parody.