• renegadespork@lemmy.jelliefrontier.net
    link
    fedilink
    English
    arrow-up
    72
    ·
    4 days ago

    There’s no excuse for this crap. Even if they insist on scraping every FOSS repo, there needs to be some logic to it (caches, diffs, longer intervals). These AI scrapers are so poorly thought out they are indistinguishable from DOS attacks.

    • DigitalDilemma@lemmy.ml
      link
      fedilink
      English
      arrow-up
      28
      ·
      4 days ago

      You’re not wrong.

      Claudebot took down one of my sites repeatedly, hammering it for the same pages over and over at horrendous rates.

      I ended up spending several days having to convert it to a SSG and hosted it on Cloudflare pages. A lot of work I didn’t need to do.

  • Poof [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    10
    ·
    3 days ago

    Out front my computer knowledge is pathetic but, I have been enjoying articles like this as they are showing what is often referred to as the tragedy of the commons. This practice of companies using extractive practices on systems and destroying them is often used, ironically, as a reason why community management is bad. We are seeing right now the plundering and destruction of public held assets in a new way and hopefully we will see these open source items protected and not privatized as land borders oceans and most other things have been to the detriment of the many.

  • DarkAri@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    edit-2
    3 days ago

    One good solution might be decentralized webhosting and infrastructure. If people want high speed access, they have to seed it at a low rate. Nothing crazy is required just constant seeding at low rates. A few max connections and a cap of maybe 50KBps min is required. You also use random hash checking against multiple clients to verify correctness, and you have p2p blacklisting of known bad actors. A solution for people who don’t have steady Internet access might be to just seed a similar amount on quota or to buy p2p credits to access the content.

    This is part of an idea I have been working on for an internet 2.0. There is a basic cryptocurrency called compute coin, and as you seed you mine coin. The verification works by utilizes many nodes to verify everything is working correctly. You can buy coin that people mine instead of seeding if you wish. A very small portion of each transaction, a fraction of a fraction of a percent goes to a nonprofit who works on the tech and fomalizes the standard. You can also just buy compute time on the network. People are free to sell their resources at whatever price they want and the market balances itself automatically. The transaction fee is calculated automatically to cover the budget of the nonprofit which oversees it. DNS is also handled p2p. If I were in control I would also make it mesh network friendly, with sliders to prioritize what the user wants, from latency, vs avoiding certain countries, vs cost, vs using a whitelist of trusted nodes for security purposes.

    This would also require swarm security to prevent any one user from being able to sniff out keys and passwords and stuff. Basically the network would work together to generate periodic temporary keys to allow machines to access the data for a period of time without revealing itself to anyone. The nonprofit would be the only completely trusted authority and it would have a board who oversees banning of nodes and the money seized would go to support the development of solutions and strategies to combat any fraud on the network. This seems expensive at first, but with Asics this can be made very cheeply. I imagine people would want to run different types of nodes to generate currency. Asics would be a good cheap solution. It’s a market that will build itself very quickly. You can also have verified nodes that cost a bit higher then average probably to access but can provide additional security for certain tasks. These can either be crowd sources or run by institutions which publish node lists.

    If the law wishes to regulate it, this could only be achieved by region specific keys and would not be network wide. Courts might have to set up a way to subohena resources that gets registered in a public domain that gets released to the public after a year or so, in order that citizens can verify what the government is snooping on.

    It can also be used for free by just having a algorithm automatically determine mining rates to pay for the use of the user, with a buffer to keep a seemless experience.

    Perhaps for this particular problem however you could just set up a p2p platform with file verification. People could offer free nodes, businesses could pay money to these nodes to get access to high speeds and large amounts of bandwidth. People can also join p2p nodes where what they can download equals to what they contribute to the network. This can just be estimated based on average use and maintained with a buffer if people really don’t want to seed all the time.

  • deathbird@mander.xyz
    link
    fedilink
    arrow-up
    7
    ·
    3 days ago

    So what’s the solution to keep abusive access from psudo-ddosing open projects? IP range blocks? Keeping access behind accounts as a speed bump for bots and a tool for tracking abuse?

    • other_cat@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Anecdotally, I only keep services that require an account to get past the landing page on my VPS. (I’m very new to self hosting and haven’t figured out ways to stop bots/scrapers yet.) So I think if nothing else, that’s a pretty easy starting point.

  • Aquaphobi@lemmy.zip
    link
    fedilink
    arrow-up
    3
    arrow-down
    24
    ·
    4 days ago

    You cannot be open source and also gate keepy. Free and open is free and open.

    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      23
      arrow-down
      1
      ·
      3 days ago

      Helps to read the actual article before commenting. The freeloading refers to corporations hammering hosting infrastructure run by volunteers.

        • ☂️-@lemmy.ml
          link
          fedilink
          arrow-up
          10
          ·
          3 days ago

          we can’t. but we can impose boundaries to corporations destroying our entire work for a few bucks.

          open source doesn’t mean pushover to capital. read the GPL if you want to get an idea how we do this the proper way.

        • rmrf@lemmy.ml
          link
          fedilink
          English
          arrow-up
          7
          ·
          3 days ago

          I’m gonna take a shot and guess you’ve never run infrastructure in your life if you can’t differentiate between a product covered by a license and a service offered as a courtesy.

        • brygphilomena@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          What? This isn’t about hiding the code and making it unavailable. It’s saying that they don’t need to pay for some very profitable corporation to use obscene amounts of their bandwidth.

          Basically it’s you get so many downloads before your throttled and deprioritized. Don’t download the same thing hundreds of times over and over everyday.