My Lemny
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
fattyfoods@feddit.nl to Open Source@lemmy.ml · 2 months ago

The Open-Source Software Saving the Internet From AI Bot Scrapers

www.404media.co

external-link
message-square
103
link
fedilink
602
external-link

The Open-Source Software Saving the Internet From AI Bot Scrapers

www.404media.co

fattyfoods@feddit.nl to Open Source@lemmy.ml · 2 months ago
message-square
103
link
fedilink
Anubis, which block AI scrapers from scraping websites to death, has been downloaded almost 200,000 times.
  • bdonvr@thelemmy.club
    link
    fedilink
    arrow-up
    31
    arrow-down
    1
    ·
    2 months ago

    Ooh can this work with Lemmy without affecting federation?

    • Captain Beyond@linkage.ds8.zone
      link
      fedilink
      arrow-up
      32
      ·
      2 months ago

      Yes.

      Source: I use it on my instance and federation works fine

      • bdonvr@thelemmy.club
        link
        fedilink
        arrow-up
        16
        ·
        2 months ago

        Thanks. Anything special configuring it?

        • Captain Beyond@linkage.ds8.zone
          link
          fedilink
          arrow-up
          20
          ·
          edit-2
          2 months ago

          I keep my server config in a public git repo, but I don’t think you have to do anything really special to make it work with lemmy. Since I use Traefik I followed the guide for setting up Anubis with Traefik.

          I don’t expect to run into issues as Anubis specifically looks for user-agent strings that appear like human users (i.e. they contain the word “Mozilla” as most graphical web browsers do) any request clearly coming from a bot that identifies itself is left alone, and lemmy identifies itself as “Lemmy/{version} +{hostname}” in requests.

    • deadcade@lemmy.deadca.de
      link
      fedilink
      arrow-up
      11
      ·
      2 months ago

      “Yes”, for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect “bots” to use an API, so you can’t put human verification in front of it. These "bots* also include applications that aren’t aware of Anubis, or unable to pass it, like all third party Lemmy apps.

      That does stop almost all generic AI scraping, though it does not prevent targeted abuse.

      • Captain Beyond@linkage.ds8.zone
        link
        fedilink
        arrow-up
        3
        ·
        1 month ago

        The API, including both user and federation, cannot.

        This is theoretically an issue however in practice Anubis only weighs requests that appear to come from a browser: https://anubis.techaro.lol/docs/design/how-anubis-works

        I just tested my instance with Jerboa and it seems to work just fine.

    • interdimensionalmeme@lemmy.ml
      link
      fedilink
      arrow-up
      8
      ·
      2 months ago

      Yes, it would make lemmy as unsearchable as discord. Instead of unsearchable as pinterest.

      • bdonvr@thelemmy.club
        link
        fedilink
        arrow-up
        3
        ·
        1 month ago

        That’s not true, search indexer bots should be allowed through from what I read here.

        • interdimensionalmeme@lemmy.ml
          link
          fedilink
          arrow-up
          9
          ·
          1 month ago

          If you allow my searchxng search scraper then an AI scraper is indistinguishable.

          If you mean, “google and duckduckgo are whitelisted” then lemmy will only be searchable there, those specific whitelisted hosts. And google search index is also an AI scraper bot.

    • infinitesunrise@slrpnk.net
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      Yeah, it’s already deployed on slrpnk.net. I see it momentarily every time I load the site.

    • Resonosity@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 month ago

      To be honest, I need to ask my admin about that!

      • 𝔽𝕩𝕠𝕞𝕥@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 month ago

        We don’t use anubis but we use iocaine (?), see /0 for the announcement post

Open Source@lemmy.ml

opensource@lemmy.ml

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !opensource@lemmy.ml

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

  • Open Source Initiative
  • Free Software Foundation
  • Electronic Frontier Foundation
  • Software Freedom Conservancy
  • It’s FOSS
  • Android FOSS Apps Megathread

Rules

  • Posts must be relevant to the open source ideology
  • No NSFW content
  • No hate speech, bigotry, etc

Related Communities

  • !libre_culture@lemmy.ml
  • !libre_software@lemmy.ml
  • !libre_hardware@lemmy.ml
  • !linux@lemmy.ml
  • !technology@lemmy.ml

Community icon from opensource.org, but we are not affiliated with them.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 129 users / day
  • 1.34K users / week
  • 3.88K users / month
  • 10.6K users / 6 months
  • 1 local subscriber
  • 40K subscribers
  • 2.24K Posts
  • 34K Comments
  • Modlog
  • mods:
  • Evan@lemmy.ml
  • kevincox@lemmy.ml
  • CrypticCoffee@lemmy.ml
  • Lettuce eat lettuce@lemmy.ml
  • BE: 0.19.12
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org