• 0 Posts
  • 63 Comments
Joined 3 years ago
cake
Cake day: June 30th, 2023

help-circle

  • you are taking a risk either way. You are placing your trust in the dev and the few that can read code.

    There is definitely a trust issue and a need for ways of conveying and building trust in smaller software projects. I think a much better solution there would be discussions about the code and how it works that aren’t hostile interrogations with foregone conclusions in pursuit of a broader anti-AI agenda. If someone just put a lot of effort into making something the details of that process should be on their mind, it should be possible to make them more accessible to people and convey that there is non-artificial understanding behind the project. Automatic hostility and suspicion makes those kinds of conversations harder and less likely.







  • If that is the case, is chardet 7.0.0 a derivative work of chardet, or is it a public domain LLM work? The whole LLM project is fraught with questions like these

    I think the reimplementation stuff is a separate question because the argument for it working looks a lot stronger, and because it doesn’t have anything to do with the source material having LLM output in it. Also if this method holds as legally valid, it’s going to be easier to just do that than justify copying code directly (which would probably have to only be copies of the explicitly generated parts of the code, requiring figuring out how to replace the rest), which means it won’t matter whether some portion of it was generated. I don’t see much reason to think that a purist approach to accepting LLM code will offer any meaningful protection.

    I’m mostly just playing along with your thought experiment. As I said, we know that projects are already accepting LLM code into projects that are nominally copyleft.

    So what though? If they aren’t entirely generated, you can’t make a full fork, and why would a partial fork be useful? If it isn’t disclosed what parts are AI, you can’t even do that without risking breaking the law.


  • but if they instead say that they copied the work into their LLM and produced a copy without protections (as chardet has done), the courts might be less willing to afford the project copyright protections if the project itself was making use of the same copyright stripping technology to strip others’ work to claim protections over copied work.

    ianal but does it even work like that? Is there any specific reason to think it does? I don’t believe you really get credit for purity and fairness vibes in the legal system. Same goes for the idea that code where it is ambiguous whether it is AI output could be considered public domain, seems kind of implausible, is there actually any reason to think the law works that way? If it did, then any copyrighted work not accompanied by proof of human authorship would be at risk, uncharacteristic for a system focused on giving big copyright holders what they want without trouble.

    the only code that may ultimately be protected is closed source code - you can’t copy it if you don’t have the source.

    There is no way, leaks happen, big tech companies have massive influence, a situation where their code falls into the public domain as soon as the public gets their hands on it just isn’t realistic. I feel suspicious that many of these concerns are coming from a place of not wanting LLM code in open source projects for other reasons, rather than the existence of a strong legal case that it represents a real and serious threat to copyleft licensing.


  • AI code damages copyleft projects no matter what - we know that some projects are already accepting AI generated code, and they don’t ask you to hide it - it is all in the open.

    I don’t see how that follows or contradicts what I’m saying though. They could hide it, easily. Even if they don’t hide it, how useful would it really ever be to only use the portions of the codebase that have been labelled as having been AI generated? Can one even rely on those labels? Making use of the non-copyrightability of AI output to copy code in otherwise unauthorized ways does not seem like a straightforward or legally safe thing to do. That’s especially the case because high profile proprietary software projects also make heavy use of AI, it doesn’t seem likely the courts will support a legal precedent that strips those projects of copyright and allow anyone to use them for whatever. So basically I’m not at all convinced about the idea that AI code damages copyleft projects, it seems unlikely to be a problem in practice.


  • The only portions of the work that can be copyrighted are the actual creative work the person has put into the work.

    Ok, but it’s not like everyone is documenting exactly which parts are generated, curated, or human written.

    Maintainers cannot prevent the LLM code from being incorporated into closed source projects without reciprocity

    Say someone incorporates GPL code without attribution, and gets sued for doing so. They try to make the argument in court that the source material they used is not copyrighted, because of AI. Won’t they have to prove that the parts they used were actually AI output for this defense to work? It isn’t like people are going around ignoring the copyright on things in general if they look like they were probably generated with AI, that isn’t enough to be safe from prosecution, because you usually can’t know the exact breakdown. It seems like preventing this loophole from being used would be as simple as keeping it ambiguous and not allowing submissions that positively affirm being entirely AI generated.



  • Both incidentally categories where I will never be happy with slopcode.

    The point here isn’t necessarily that any particular use of LLMs is a good tradeoff (I can accept that many will not be especially when security and correct operation is very important), just that quantity clearly matters, to refute the point you were making earlier that it doesn’t.

    We are actively building a history of cases where LLM usage correlates heavily with that slope you mentioned, but hey that’s OK, we aren’t allowed to call things out before they happen, judgement may only be passed once the damage is done right?

    Out of curiosity, we know that LLM usage increases cognitive deficit and in some cases leads to psychosis. How many fatalities would you say is an acceptable number before governments act? How degraded do we let our societies get before we reign it in?

    I think it’s a mistake to consider all LLM usage as one thing, and that thing as some kind of sin to be denounced as a whole rather than in part, and not considered beyond thinking of ways to get rid of it (which is effectively impossible). There were people who had this attitude towards for example electricity, which is actually very dangerous when misused and caused lots of fires and electrocutions, but the way those problems eventually got mitigated was by working out more sensible ways to use it rather than returning to an off-grid world.


  • One example of a place where quantity is lacking is web browsers. Another might be mobile operating systems. I am glad projects like Firefox and GrapheneOS exist, but it’s obvious that the volume of work needed to achieve broad compatibility and competitiveness for these types of software is a limiting factor. As for the idea that any LLM use is a slippery slope, the way to avoid the slippery slope fallacy would be to have compelling evidence or rationale that any use really does lead naturally to problematic use; without that the argument could apply to basically any programming thing that gets to be associated with things done badly (ie. Java), but I think it isn’t usually the case that a popular tool has genuinely no good or safe ways to use it and I don’t think that’s true for AI.


  • I will complain about quantity, many areas where open source projects are competing with closed source commercial products they have not achieved feature parity or a comparable level of polish, quantity matters. So does, as someone else touched on, quality of life improvements to the process of writing code like ease of acquiring and synthesizing information. That doesn’t mean it’s necessarily a worthwhile tradeoff, but how much is really being sacrificed depends on what exactly is being done with a LLM. To me one part of what’s described here that’s clearly going too far is using it to automate communication with other people contributing to the project, there’s no way that is worth it.

    As for the gun thing, I will support entirely banning LLM powered weapons intended to kill people, that’s an easy choice.




  • The main complaints about Matrix I’ve heard though are about behind the scenes stuff rather than features, which the video touches on:

    But there are some reasons why I think XMPP is superior. In Matrix, when you join a room, your server downloads and stores the entire history of that room. If someone on a federated server posts illegal content in a room you’re in, your server is now hosting it, and you are liable. Whereas in XMPP, messages are relayed in real time. Group chat, MU history stays on your server hosting that room. So your server only stores messages for your users which means that no content caching there is no content caching from other servers. This is a fundamental architectural difference which makes the XMPP protocol better in my opinion.

    Personally I don’t know that much about it but I briefly looked into what it would take to write a client for Matrix a few years ago and it seemed pretty daunting to work with. Maybe it would be possible to write software that implements more Discord features on top of XMPP to have something that works more smoothly.



  • If your focus is LLMs, get a 3090 gpu. Vram is the most important thing here because it determines what models you can load and run at a decent speed, and having 24Gb will let you run the mid range models that specifically target this amount of memory because of this being a very standard amount to have for hobbyists. These models are viable for coding, the smaller ones are less so. Looking at prices it seems like you can get this card for 1-2k depending on if you go used or refurbished. I don’t know if better price options are going to be available soon but with the ram shortage and huge general demand it kind of doesn’t seem like it.

    If you want to focus on image or video generation instead, I understand that there are advantages to going with newer generation cards because certain features and speed is more of a factor than just vram but I know less about this.