• melpomenesclevage@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    28
    ·
    edit-2
    21 hours ago

    i hear there’s a tool called (I think) ‘nepenthe’ that creates a loop for an LLM, if you use that in combination with a fairly tight blacklist of IP’s you’re certain are LLM crawlers, I bet you could do a lot of damage, and maybe make them slow their shit down, or do this in a more reasonable way.

    • PrivacyDingus@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      15 hours ago

      nepenthe

      It’s a Markov-chain-based text generator which could be difficult for people to implement on repos depending upon how they’re hosting them. Regardless, any sensibly-built crawler will have rate limits. This means that although Nepenthe is an interesting thought exercise, it’s only going to do anything to things knocked together by people who haven’t thought about it, not the Big Big companies with the real resources who are likely having the biggest impact.

      • melpomenesclevage@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 hours ago

        might hit a few times, or maybe there’s a version that can puff stuff up the data in the sense of space, and salt it in the sense of utility.