• 0 Posts
  • 37 Comments
Joined 1 year ago
cake
Cake day: June 7th, 2025

help-circle
  • I’m not an expert by any means I’m just a dabbler, but my understanding is: In theory, more parameters make richer, wider, and deeper model knowledge possible, and with extensive enough training, those parameters could all be important. That said, there is a lot of megapixel-like inflation and there is no guarantee that any of those parameters are actually useful so in practice, really “advanced” models tend to do a better job of maximizing the usefulness of the limited parameters they do have to run on smaller devices. In general, I tend towards the highest parameter size of a particular model that I can reasonably run. My typical target range is between 8GB up to maybe 20GB, which depending on model might be in the 9b to 30b parameters range, and I might even be erring on the wrong side of this and maybe I’d even be better off with smaller parameter models.

    There’s also a lot of models nowadays that use “active” parameters, so the model itself will have X parameters, but then it will determine which of those parameters are most relevant to the task or query at hand, and prune off all but the most relevant ones, so you might have a 30B model, but as soon as you run it, it turns itself into a specialized 4B model. You still need to load the whole model into some kind of RAM typically so it can decide which parameters are relevant, but once it does, it will run much faster. This is another way you can try to run larger models on more limited hardware. Older “dense” models that don’t use this technique with all parameters always active are still typically preferred for some tasks like coding, but YMMV.

    Either way, it’s still sort of a crapshoot, there’s a lot of randomness and subjectiveness, and very small parameter models often seem to realistically be able to outperform much bigger models when they are “good”, “well-trained” advanced models, and they will typically be much faster, so if you don’t like the response, it’s much easier to just ask again or retry. I tend to trust the community wisdom when it comes to this, although I also think there’s a lot of cargo-culting and herd-following going on, I don’t know enough to do anything too much different from the herd myself, other than be willing to experiment a little. Latest is not always greatest, but in a field as quickly moving as this it often is. Don’t be afraid to try older models, or less popular models. You’ll often be disappointed, but not always.

    Quantization is a form of compression, basically instead of using floating point precision to weigh the “strengths” of the various parameters (default is typically F16 or 16 bits per parameter weight), they get quantized down to smaller groups of bits. Q4 means you’re using 4 bits (essentially ranking each parameter on an integer scale from 0 to 15 instead of a floating point from 0 to 1) and in practice this is usually almost as good. Q8 would be even closer to the original full-size model, but smaller quants like Q2 and Q3 start losing quality. Other quantization-related techniques like i-Matrix (imat) map these values non-linearly and situationally, which is particularly helpful on quantizations Q3 and smaller, which are then called IQ3. The community has adopted Q4 as pretty much the go-to quantization level as the best available compromise between having more parameters being squeezed into less memory without destroying the inherent accuracy of those parameters.


  • For chat usage (which is strictly a more efficient way to generate code on the LLM’s part, although you have to keep carefully guided and compartmentalized otherwise it typically requires a lot more testing and sometimes back-and-forth iteration on your part) 12GB is plenty to run many decent LLMs, you’ll typically want to use a Q4 quantization to make models with larger parameter fit into smaller memory, sometimes an IQ2 or IQ3 if you really want a particular model.

    For agentic usage (where the LLM is trained and optimized to use a harness like this to start requesting tool calls and getting their results and using the results of the tool calls to inform what it’s trying to do) it’s quite a bit more challenging to do on consumer hardware at a tolerable speed. The tools often generate large amounts of output which then take a long time to process, and the models and harnesses are both typically quite a bit stupider about using your limited resources efficiently. If you’re using to commercial “frontier” agentic models like Claude Code you’re going to have a bad time.

    That said, it is absolutely possible to do agentic AI on consumer hardware (just the GPU you have, not 6 of them), as long as you’re reasonably patient, using a harness properly tuned for efficiency. Out-of-the-box, many if not most are designed for remote API usage, even the “open source, local” ones realistically rely on free tier APIs and are inherently wasteful in terms of them not really caring how many tokens you burn in these remote datacenters and they’re expecting to just be able to iterate over and over again until they get it right. You don’t have that luxury when you’re getting slow tokens.

    Is PewDiePie’s any better or more efficient? I don’t know, I haven’t tried it yet. I prefer more minimal harnesses personally, OpenCode is about the most usable I’ve found personally, although I’m starting to experiment with Pi-mono (called Pi, but that’s unsearchable) which seems very promising, and I know quite a few people who have had good successful agent usage with Hermes Agent.

    I’m not going to pretend it’s going to be easy or that you’ll necessarily have very good results. I am pretty lukewarm on AI as a whole, but I am personally deeply invested in making sure I have fully local access to it in as much capacity as is currently technologically possible, as a personal digital sovereignty issue.

    As for hardware, I have a 12GB card myself and you don’t really need to fit everything into VRAM these days. I have an AMD X3D CPU which allows me to offload some of the model to system RAM with pretty decent performance, maybe it’s prohibitive on different architectures or configurations I don’t know but it’s worth a try. glm-4.7-flash:Q4_K_M from ollama is the model I’ve had the most consistent success with and with ollama running it with the context window set to 50,000 (context should also be set to be quantized to Q4_K_M), I end up with almost half of it offloaded to system RAM and it still runs quite fast thanks to the flash attention feature. I’ve worked with gemma4 quite a lot too and it’s definitely really fast but it’s also a bit unstable/weird at times, at least the heretic version hf.co/Stabhappy/gemma-4-26B-A4B-it-heretic-GGUF:Q4_K_M I’m running is. Still, if you really do need to fit everything into a smaller set of RAM you might try the gemma4 E4B models which clock in around 9GB when quantized. Qwen3.6 is I guess supposed to be really good too and should fit nicely on your 12GB card, but I haven’t had much opportunity to play with it yet. Qwen3 and 3.5 felt rather disappointing to me for agentic use but YMMV.

    You’re not completely going to outsource all software and all code you write to AI using a local model, the way companies are doing with those commercial models. But I consider that an advantage, not a flaw. I find it’s much more useful to have it help, suggest and advise, not to completely replace everything I’m doing. Yes, sometimes it’s slow and sometimes it’s wrong, but so are other people when I ask them sometimes. I’m prepared for it, and you should be too. Don’t get complacent.


  • Yes. mine is exposed publicly (with fail2ban) on a VPS with a public IP and a public DNS name and it’s fine. Use a minimal configuration that meets your needs, use secure passwords like you would for any public service and keep it up to date, and stay aware of any potential news that might make you aware of any severe and widespread vulnerabilities in the future (there haven’t been any in Nextcloud so far). It is not nearly as terrifying as people make it out to be to share public services on the public internet. Most decent software is secure-by-default. Yes vulnerabilities and attacks can happen but they are the exception not the rule.


  • Maybe if the model trains could actually bring in your groceries and mow your lawn they’d be comparable. Granted, self-hosted software can’t do those exact things either, but it can do an awful lot of the digital stuff that’s part of our lives now which often takes up just as much time and effort if not more. Model trains are a banger hobby, but homelabbing can easily be more than just a hobby, it’s deeply practical too, and I’d argue it’s actually a necessity for establishing personal digital sovereignty and privacy going forward.



  • Email chains and mailing lists are not really a practical way to develop anymore, and it is increasingly anachronistic (as is the idea of tying your identity to an email which is also baked into basic git). This was the only realistic democratic and federated option when git was designed, but it was never the ideal one. Forgejo is trying to build a better, more ideal, also-federated alternative that is really designed for code collaboration from the ground up. Once the design is stabilized, there’s no reason it couldn’t get built into git also. I would love to be able to create a PR with git itself and have it automatically submitted to the origin repository.


  • Find something on craigslist or local pickup on ebay, check government/police surplus, or do some freecycling. At least in my area a lot of people leave their e-waste computers at Best Buy, often in the doorway, nobody cares if you come and pick them up. Even if they’re broken (and they’re often perfectly functional and sometimes surprisingly powerful) it likely only takes a few before you’ve got some functional combination of parts.

    It’s likely not as much of a picker’s heaven anymore since I imagine the huge wave of windows-10-obsolete computers being thrown away for no reason has probably mostly subsided, but there is so much old and perfectly functional stuff out there it’s really unjustifiable to be buying something new especially at today’s modern prices.


  • The only problem with something like Revanced is that it can go away at literally any time. It could be shut down tomorrow and you’d lose access to everything it provides. That’s fine, or at least tolerable, if you ALSO have something self-hosted you can rely on in case that happens. If you don’t have downloaded music self-hosted, then you’re totally relying on Revanced permanently and you lose everything if it goes away. Maybe for something like music that’s an acceptable risk, but you have to consider it and decide where it is an acceptable risk. What are you going to do if those services you’re relying on go away?

    Self-hosting, like you said, is about the independence, and the knowledge that once it’s up and running on your own hardware, it won’t just go away on its own, and it can’t just get “shut down” unless you choose to. You might not need that for every service you rely on, but there are probably at least some you would struggle without, and those are things you should consider self-hosting. The more you think about it, and the more comfortable you get with it, the more likely you’ll decide other things are important enough to self-host after all.


  • Gitea is developed by a corporation. If you trust corporations not to enshittify eventually, maybe Gitea will be the exception to the rule, but I doubt it, for sufficiently long definitions of “eventually”. Forgejo was forked specifically because the governance needed to be detached from the corporation, and that wasn’t going to happen with Gitea. The community of open-source developers mostly voted with their feet. Forgejo is, in my humble opinion, going places. Gitea is not. Nothing specifically wrong with it, per se, but it doesn’t really offer a sustainable development path forward I don’t think.



  • The simple, maybe unhelpful answer is that fail2ban needs to have two things at once: the logs, and a way to block the network traffic.

    Where exactly you want those things to coincide is really up to you, there might only be one point that simultaneously has access to both those things, or there might be multiple points depending on how your systems and services and network is configured, or if you’re in a bad situation you might find you don’t really have any single point where both those things are simultaneously possible, in which case you’ll need to reconfigure something until you do have at least one point where both those things are again coincident.

    As far as best practices, I can’t really say for sure, but I know that one of the more convenient ways to run it is usually on the same system, I usually run it outside of docker, on the host, which can pretty easily get access to the container’s logs if necessary, and let fail2ban block traffic on the whole system. For me, any system running any publicly accessible network services that allow password login gets a fail2ban instance.

    A whole-network approach where you block the traffic on the firewall is fine too, if that’s what you prefer and what you want to work towards, but it’s probably going to be significantly more complex to set up because now you need to either figure out how to get fail2ban to be able to access your firewall or a way for your firewall to get the logs it needs.


  • It’s literally the core foundation of my entire self-hosting configuration. I could not live without Forgejo. I can’t imagine being shackled to Github or some other hosted provider anymore for something as important as my git repositories.

    Gitea’s okay too in every practical respect, but Forgejo is the more community-led fork and in my opinion less likely to be corporatized and enshittified far in the future, so I’ve hitched my wagon there and couldn’t be happier. The fork is starting to diverge slowly, so it seems like direct migration is no longer possible. That said, git repositories are git repositories, and they have most of the important history and stuff inside them already, so unless you’re super attached to stuff like issues and whatever you can still migrate, you’ll just lose some stuff.


  • You don’t have any great options but you do have some options. You’ll need dynamic DNS, which you can get for free by various providers. This will manage a “dynamic” DNS entry for your occasionally changing, non-static IP at home. The dynamic DNS entry won’t be on your own domain name, it will be on the provider’s domain name. But wait! That’s just step one.

    You can still get your own, fully-functional domain name, and you can have all the domains and subdomains you want, and set them up however you want, with one important restriction: You can’t use IP addresses (because yours is dynamic, and changes all the time and you would have to be constantly updating your domain every time it does, and there would be delays and downtime while everything gets updated).

    Instead, your personal domains have to use CNAME records. This substitutes the IP from a different domain INTO your domain. So you CNAME every entry on your own fancy domains to point at your dynamic DNS provider, which manages the dynamic part of the problem for you and always gives the real IP you need. Nobody sees the dynamic DNS name, it’s there, but it’s happening behind the scenes, they still see your fancy personalized domain names.

    It’s still not going to be perfect, it won’t work well or at all for certain services like email hosting (self-hosting this is not for the faint of heart anyway) that are very strict about how their DNS and IP addresses need to be set up, but it will likely be good enough for 99% of the stuff you want to self-host.






  • I think there’s room for a little bit of nuance that page doesn’t do a great job of describing. In my opinion there’s a huge difference between volunteer maintainers using AI PR checks as a screening measure to ease their review burden and focusing their actual reviews on PRs that pass the AI checks, and AI-deranged lone developers flooding the code with “AI features” and slopping out 10kloc PRs for no obvious reason.

    Just because a project is using AI code reviews or has an AGENTS.md is not necessarily a red flag. A yellow flag, maybe, but the evidence that the Linux Kernel itself is on that list should serve as an example of why you can’t just kneejerk anti-AI here. If you know anything about Linus Torvalds you know he has zero tolerance for bad code, and the use of AI is not going to change that despite everyone’s fears. If it doesn’t work out, Linus will be the first one to throw it under the bus.



  • It’s being built inch by inch. You won’t even know it’s there until you realize you can’t squeeze through it anymore. The trend is extremely obvious: TPM, Secure boot, Windows Store UWP applications, forced updates without consent, or intentional opt-outs that conveniently get ignored or forgotten when it’s convenient for Microsoft to force something. They are intent on taking full control of PCs and locking them down exactly the same way Android phones are locked down, they will follow a few footsteps behind what Android is doing now by preventing third-party apps and app stores, but it’s obviously coming, because they are on exactly the same path for exactly the same reasons.

    I don’t imagine we can save everybody either. But that doesn’t mean it’s not worth trying. The more they tighten their grip, the more will slip through their fingers, and all I care about is that the rebellion against Windows grows large enough to survive indefinitely, if not thrive.