lemme in@lemm.ee to

Fuck AI@lemmy.world · 3 months ago

A new web crawler launched by Meta last month is quietly scraping the web for AI training data

316

A new web crawler launched by Meta last month is quietly scraping the web for AI training data

lemme in@lemm.ee to

Fuck AI@lemmy.world · 3 months ago

Meta has not announced the new bot, dubbed Meta External Agent, beyond updating an existing web page for developers.

Meta has quietly unleashed a new web crawler to scour the internet and collect data en masse to feed its AI model.

The crawler, named the Meta External Agent, was launched last month, according to three firms that track web scrapers and bots across the web. The automated bot essentially copies, or “scrapes,” all the data that is publicly displayed on websites, for example the text in news articles or the conversations in online discussion groups.

A representative of Dark Visitors, which offers a tool for website owners to automatically block all known scraper bots, said Meta External Agent is analogous to OpenAI’s GPTBot, which scrapes the web for AI training data. Two other entities involved in tracking web scrapers confirmed the bot’s existence and its use for gathering AI training data.

While close to 25% of the world’s most popular websites now block GPTBot, only 2% are blocking Meta’s new bot, data from Dark Visitors shows.

Earlier this year, Mark Zuckerberg, Meta’s cofounder and longtime CEO, boasted on an earnings call that his company’s social platforms had amassed a data set for AI training that was even “greater than the Common Crawl,” an entity that has scraped roughly 3 billion web pages each month since 2011.

Chat

Admiral Patrick@dubvee.org
link
fedilink
English
arrow-up
15·
3 months ago
Not just data theft, but selling stolen goods (more or less).

They’re stealing content and using that to build a service that they sell and profit from.
- Drewelite@lemmynsfw.com
  link
  fedilink
  English
  arrow-up
  2·
  3 months ago
  They do open source the model at least, which is more than you can say for any of the other major companies doing AI

Fuck AI@lemmy.world

fuck_ai@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !fuck_ai@lemmy.world

“We did it, Patrick! We made a technological breakthrough!”

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

27 users / day
1.73K users / week
2.91K users / month
8.87K users / 6 months
12 local subscribers
1.37K subscribers
251 Posts
3.85K Comments
Modlog