I removed some spam from Globalnews.ca just now. Thanks for the reports.
These are real stories that are coming from the RSS feed, but that doesn’t mean they belong here. Some free news sites just have feeds that include obvious spam. Presumably, they’re trying to make money so they can stay in existence, which is fine, but that doesn’t mean it needs to pollute these communities and the people subscribed to them
Currently, the blacklist, with some new additions from today’s spam, is:
BLACKLIST_REGEXES = [
r'Shop our top 5 deals of the week',
r'Amazon deal of the day.*',
r'Today.s Wordle.*',
r'Wordle today:.*',
r'.*NYT Connections.*',
r'.*[A-Z][A-Z][A-Z][A-Z][A-Z].*[A-Z][A-Z][A-Z][A-Z][A-Z].*[A-Z][A-Z][A-Z][A-Z][A-Z].*',
r'Daily Deal:.*',
r'Shop our .*',
r'.*\(on sale now.*',
r'.*Big Deal Days.*',
r'.*Way Day Sale.*',
]
The middle one with the capital letters is to filter out Youtube channels that like to include a lot of all-caps clickbait in their titles. If it goes beyond a certain level, the posts don’t get put onto Lemmy.
If you see more spam, or stories that seem like pollution, keep reporting it. I might tell the bot to preemptively remove stories that get spam reports. There have been a couple of people who’ve tried to report things just because they disagree with them, but almost all the spam reports are legit, so it might make sense to default to that behavior and I can fix it up afterwards if people are reporting things bogusly.
I’ll keep you posted. Thanks for the reports.