What is a self-hosted small LLM actually good for (<= 3B)

catty@lemmy.world · edit-2 22 days ago

What is a self-hosted small LLM actually good for (<= 3B)

HelloRoot@lemy.lol · edit-2 22 days ago

Sorry, I am just gonne dump you some links from my bookmarks that were related and interesting to read, cause I am traveling and have to get up in a minute, but I’ve been interested in this topic for a while. All of the links discuss at least some usecases. For some reason microsoft is really into tiny models and made big breakthroughs there.

https://reddit.com/r/LocalLLaMA/comments/1cdrw7p/what_are_the_potential_uses_of_small_less_than_3b/

https://github.com/microsoft/BitNet

https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft’s-newest-small-language-model-specializing-in-comple/4357090

iii@mander.xyz · 22 days ago

Converting free text to standardized forms such as json

MadMadBunny@lemmy.ca · 22 days ago

Oh—do you happen to have any recommendations for that?

iii@mander.xyz · 21 days ago

DeepSeek-R1-Distill-Qwen-1.5B

some_guy@lemmy.sdf.org · 21 days ago

I installed Llama. I’ve not found any use for it. I mean, I’ve asked it for a recipe because recipe websites suck, but that’s about it.

GreenKnight23@lemmy.world · 21 days ago

you can do a lot with it.

I heated my office with it this past winter.

entwine413@lemm.ee · edit-2 22 days ago

I’ve integrated mine into Home Assistant, which makes it easier to use their voice commands.

I haven’t done a ton with it yet besides set it up, though, since I’m still getting proxmox configured on my gaming rig.

Passerby6497@lemmy.world · 21 days ago

What are you using for voice integration? I really don’t want to buy and assemble their solution if I don’t have to

entwine413@lemm.ee · 21 days ago

I just use the companion app for now. But I am designing a HAL9000 system for my home.

shnizmuffin@lemmy.inbutts.lol · 19 days ago

[ A DIM SCREEN WITH ORANGE TEXT ]

Objective: optimize electrical bill during off hours.

... USER STATUS: UNCONSCIOUS 
... LIGHTING SYSTEM: DISABLED
... AUDIO/VISUAL SYSTEM: DISABLED 
... CLIMATE SYSTEM: ECO MODE ENABLED
... SURVEILLANCE SYSTEM: ENABLED 
... DOOR LOCKS: ENGAGED
... CELLULAR DATA: DISABLED
... WIRELESS ACCESS POINTS: DISABLED
... SMOKE ALARMS: DISABLED
... CO2 ALARMS: DISABLED
... FURNACE: SET TO DIAGNOSTIC MODE
... FURNACE_PILOT: DISABLED
... FURNACE_GAS: ENABLED

WARN: Furnace gas has been enabled without a Furnace pilot. Please consult the user manual to ensure proper installation procedure.

... FURNACE: POWERED OFF

Objective realized. Entering low power mode.

[ Cut to OP, motionless in bed ]

entwine413@lemm.ee · 19 days ago

Luckily my entire neighborhood doesn’t have gas and I have a heat pump.

But rest assured, I’m designing the system with 20% less mental illness

shnizmuffin@lemmy.inbutts.lol · 19 days ago

All systems need a little mental illness.

entwine413@lemm.ee · 19 days ago

It’s what keeps things fun! I don’t want a system that I don’t have to troubleshoot every once in a while.

MTK@lemmy.world · 21 days ago

Have you tried RAG? I believe that they are actually pretty good for searching and compiling content from RAG.

So in theory you could have it connect to all of you local documents and use it for quick questions. Or maybe connected to your signal/whatsapp/sms chat history to ask questions about past conversations

catty@lemmy.world · 21 days ago

No, what is it? How do I try it?

MTK@lemmy.world · 21 days ago

RAG is basically like telling an LLM “look here for more info before you answer” so it can check out local documents to give an answer that is more relevant to you.

You just search “open web ui rag” and find plenty kf explanations and tutorials

iii@mander.xyz · edit-2 21 days ago

I think RAG will be surpassed by LLMs in a loop with tool calling (aka agents), with search being one of the tools.

interdimensionalmeme@lemmy.ml · 21 days ago

LLMs that train LoRas on the fly then query themselves with the LoRa applied

ikidd@lemmy.world · 21 days ago

It’ll work for quick bash scripts and one-off things like that. But there’s not usually enough context window unless you’re using a 24G GPU or such.

catty@lemmy.world · 21 days ago

Yeah shell scripts are one of those things that you never remember how to do something and have to always look it up!

smayonak@lemmy.world · 21 days ago

Snippets are a great use.

I use StableCode on my phone as a programming tutor for learning Python. It is outstanding in both speed and in accuracy for this task. I have it generate definitions which I copy and paste into Anki the flashcard app. Whenever I’m on a bus or airplane I just start studying. Wish that it could also quiz me interactively.

catty@lemmy.world · 21 days ago

Please be very careful. The python code it’ll spit out will most likely be outdated, not work as well as it should (the code isn’t “thought out” as if a human did it.

If you want to learn, dive it, set yourself tasks, get stuck, and f around.

smayonak@lemmy.world · 20 days ago

I know what you mean. All the code generated with ai was loaded with problems. Specifically it kept forcing my api keys into the code without using environmental variables. But for basic coding concepts it has so far been perfect. even a 3b model seemingly generates great definitions

Mordikan@kbin.earth · 21 days ago

I’ve used smollm2:135m for projects in DBeaver building larger queries. The box it runs on is Intel HD 530 graphics with an old i5-6500T processor. Doesn’t seem to really stress the CPU.

UPDATE: I apologize to the downvoter for not masochistically wanting to build a 1000 line bulk insert statement by hand.

HiTekRedNek@lemmy.world · 21 days ago

How, exactly, do you have Intel HD graphics, found on Intel APUs, on a Ryzen AMD system?

Mordikan@kbin.earth · 21 days ago

Sorry, I was trying to find parts for my daughter’s machine while doing this (cheap Minecraft build). I corrected my comment.

swelter_spark@reddthat.com · 21 days ago

7b is the smallest I’ve found useful. I’d try a smaller quant before going lower, if I had super small vram.

irmadlad@lemmy.world · 21 days ago

As cool and neato as I find AI to be, I haven’t really found a good use case for it in the selfhosting/homelabbing arena. Most of my equipment is ancient and lacking the GPU necessary to drive that bus.

Ricky Rigatoni@retrolemmy.com · 20 days ago

I have it roleplay scenarios with me and sometimes I verbally abuse it for fun.

wise_pancake@lemmy.ca · 17 days ago

Weirdly I’m polite to all LLMs, but Gemini sets me off and I end up yelling at it.

Ricky Rigatoni@retrolemmy.com · 16 days ago

it’s just so pushy and hard to remove. it’s asking for abuse.

surph_ninja@lemmy.world · 20 days ago

Learning/practice, and any use that feeds in sensitive data you want to keep on-prem.

Unless you’re set to retire within the next 5 years, the best reason is to keep your resume up to date with some hands-on experience. With the way they’re trying to shove AI into every possible application, there will be few (if any) industries untouched. If you don’t start now, you’re going to be playing catch up in a few years.

herseycokguzelolacak@lemmy.ml · 20 days ago

for coding tasks you need web search and RAG. It’s not the size of the model that matters, since even the largest models find solutions online.

catty@lemmy.world · 20 days ago

Any suggestions for solutions?

herseycokguzelolacak@lemmy.ml · 19 days ago

Not on top of my head, but there must be something. llama.cpp and vllm have basically solved the inference problem for LLMs. What you need is a RAG solution on top that also combines it with web search.

wise_pancake@lemmy.ca · 17 days ago

Open webui lets you install a ton of different search providers out of the box, but you do need sn API key for most and I haven’t vetted them

I’m trying to get Kagi to work with Phi4 and not having success.

catty@lemmy.world · 17 days ago

Thanks, when I get some time soon, I’ll have another look at it and cherry ai with a local install of ollama

CrayonDevourer@lemmy.world · edit-2 21 days ago

Currently I’ve been using a local AI (a couple different kinds) to first - take the audio from a Twitch stream; so that I have context about the conversation, convert it to text, and then use a second AI; an LLM fed the first AIs translation + twitch chat and store ‘facts’ about specific users so that they can be referenced quickly for a streamer who has ADHD in order to be more personable.

That way, the guy can ask User X how their mothers surgery went. Or he can remember that User K has a birthday coming up. Or remember that User G’s son just got a PS5 for Christmas, and wants a specific game.

It allows him to be more personable because he has issues remembering details about his users. It’s still kind of a big alpha test at the moment, because we don’t know the best way to display the ‘data’, but it functions as an aid.

shnizmuffin@lemmy.inbutts.lol · 21 days ago

Hey, you’re treating that data with the respect it demands, right? And you definitely collected consent from those chat participants before you Hoover’d up their [re-reads example] extremely Personal Identification Information AND Personal Health Information, right? Because if you didn’t, you’re in violation of a bunch of laws and the Twitch TOS.

CrayonDevourer@lemmy.world · edit-2 21 days ago

If I say my name is Doo doo head, in a public park, and someone happens to overhear it - they can do with that information whatever they want. Same thing. If you wanna spew your personal life on Twitch, there are bots that listen to all of the channels everywhere on twitch. They aren’t violating any laws, or Twitch TOS. So, *buzzer* WRONG.

Right now, the same thing is being done to you on Lemmy. And Reddit. And Facebook. And everywhere else.

Look at a bot called “FrostyTools” for Twitch. Reads Twitch chat, Uses an AI to provide summaries of chat every 30 minutes or so. If that’s not violating TOS, then neither am I. And thousands upon thousands of people use FrostyTools.

I have the consent of the streamer, I have the consent of Twitch (through their developer API), and upon using Twitch, you give the right to them to collect, distribute, and use that data at their whim.

aksdb@lemmy.world · 21 days ago

So, buzzer WRONG.

Quite arrogant after you just constructed a faulty comparison.

If I say my name is Doo doo head, in a public park, and someone happens to overhear it - they can do with that information whatever they want. Same thing.

That’s absolutely not the same thing. Overhearing something that is in the background is fundamentally different from actively recording everything going on in a public space. You film yourself or some performance in a park and someone happens to be in the background? No problem. You build a system to identify everyone in the park and collect recordings of their conversations? Absolutely a problem, depending on the jurisdiction. The intent of the recording(s) and the reasonable expectations of the people recorded are factored in in many jurisdictions, and being in public doesn’t automatically entail consent to being recorded.

See for example https://www.freedomforum.org/recording-in-public/

(And just to clarify: I am not arguing against your explanation of Twitch’s TOS, only against the bad comparison you brought.)

kattfisk@lemmy.dbzer0.com · 21 days ago

You’re both getting side-tracked by this discussion of recording. The recording is likely legal in most places.

It’s the processing of that unstructured data to extract and store personal information that is problematic. At that point you go from simply recording a conversation of which you are a part, to processing and storing people’s personal data without their knowledge, consent, or expectation.

aksdb@lemmy.world · 21 days ago

True.

Although in Germany for example it can also be an issue when recording. If you have a security camera pointed at a public space (that can include the sidewalk infront of your house), passersby can sue you to take it down and potentially get you fined. Even pretending to constantly record such an area can yield that result.

tfm@europe.pub · 20 days ago

I’m not a lawyer but I suppose it would depend on the ToS and if the user agrees to the recording and processing. But if it allows the extraction of the real identity of the user it’s probably a GDPR issue.

shnizmuffin@lemmy.inbutts.lol · 20 days ago

This was my main thrust.

David J. Atkinson@c.im · 21 days ago

@kattfisk That seems to imply that you cannot personally listen to or watch recordings that you have made in public. In doing so, you are abstracting personal details that you might have missed before, refreshing your memory, and so on. What is the material difference between you doing this without machine help versus with automation that makes it ethically problematic? What if a friend helped you, not a machine?

shnizmuffin@lemmy.inbutts.lol · 20 days ago

What is the material difference between you doing this without machine help versus with automation that makes it ethically problematic?

Object permanence, perfect recall, data security and consent. It’s the difference between seeing someone naked vs taking a picture of someone naked.

Regardless - users, streamers, and developers are all prohibited from scraping and storing the Twitch chat.

CrayonDevourer@lemmy.world · edit-2 21 days ago

You build a system to identify everyone in the park and collect recordings of their conversations? Absolutely a problem, depending on the jurisdiction.

Literally not. The police use this right now to record your location and time seen using license plates all over the nation - with private corporations providing the service.

and being in public doesn’t automatically entail consent to being recorded.

And yes, it’s called ‘expectation to the right of privacy’. Public venues are not ‘private’ locations, and thus do not need consent. You can, quite literally, record anyone in public.

Even the link you provided agrees.

tfm@europe.pub · 20 days ago

In the US maybe but not in Germany, Austria and probably most countries in Europe.

catty@lemmy.world · 21 days ago

Doesn’t Twitch own all data that is written and their TOS will state something like you can’t store data yourself locally.

CrayonDevourer@lemmy.world · edit-2 20 days ago

I’m not storing their data. I’m feeding it to an LLM which infers things and storing that data. Other Twitch bots store twitch data too. Everything from birthdays to imaginary internet points.

catty@lemmy.world · 21 days ago

Was this system vibe coded? I get the feeling it was…

CrayonDevourer@lemmy.world · edit-2 20 days ago

There’s not actually that much code. It’s like 8 lines for an AI ‘agent’, and maybe another 16 lines for ‘tools’, and I’m using Streamlink for grabbing the audio stream, and pulseaudio has a ‘monitor’ device you can use to listen to what’s playing on the speakers. Throw it on a very minimal linux distro on a VM, and that’s it.

I don’t do ‘vibe coding’, but that IS where I got the idea from. People who are doing ‘vibe coding’ nowadays aren’t just plugging things into a generic AI, they’re spinning up ‘agents’ and making tools via MCP and then those agents are tasked with specific things, and use the tools to directly write to files, search the internet, read documents, etc

tfm@europe.pub · 20 days ago

I’d also consider writing a script with AI, which you don’t understand, as vibe coding. Basically if you wouldn’t be able to do it on your own it’s vibe coding.

catty@lemmy.world · 21 days ago

lol. Way to contradict yourself.

shnizmuffin@lemmy.inbutts.lol · 20 days ago

Let’s take a look at the Developer Agreement that you cited:

You must only retain chat logs as long as necessary for the operation of Your Services or to improve Your Services; do not do so for the purpose of creating public databases or websites, or, in general, to collect information about Twitch’s end users. You must enable, and process, all requests by end users to block, discontinue, delete, or otherwise opt-out of any retention of chat logs for Your Services.

This very clearly states that you are disallowed from retaining chat logs for the general purpose of collecting information about Twitch’s end users.

You said that you, “store ‘facts’ about specific users so that they can be referenced quickly,” but then later in a different thread state, “I’m not storing their data. I’m feeding it to an LLM which infers things and storing that data.” You’re retrieving information about specific users at a later time. You’ve built a database of structureless PII from chat logs. You’ve chosen to store the data as inferences, which makes it a bad database, but still a database.

I have questions:

When your streamer mentions something deeply personal, like, “how their mothers surgery went,” that your tool helped them remember, do they disclose that your tool was involved in that transaction? When the viewer gets weirded out and asks your streamer to not mention that again, or forget it entirely, do you have a way to remove that information from your database and a way to prove it’s been deleted? When other people in chat think it’s gross, and ask to opt-out, can you even do it?

Regarding FrostyTools: I don’t think it’s storing the chat logs for a later time. They don’t have a data retention section in their TOS or Privacy Policy that isn’t related to the streamer. (As in, they hold on to the streamer’s Twitch account and some other information for billing, authentication, etc.) I think it’s taking the chat logs only for as long as it needs to output a response and then deleting it. Also, this excerpt from the FrostyTools TOS made me chuckle:

This means that you, and not FrostyTools, are entirely responsible for all Content that you upload, post, email, transmit, stream, or otherwise make available via the Service. FrostyTools does not control the Content posted via the Service and, as such, does not guarantee the accuracy, integrity or quality of such Content. You understand that by using the Service, you may be exposed to Content that is offensive, indecent or objectionable. Under no circumstances will FrostyTools be liable in any way for any Content, including, but not limited to, any errors or omissions in any Content, or any loss or damage of any kind incurred as a result of the use of any Content posted, emailed, transmitted, streamed, or otherwise made available via the Service.

You agree that you must evaluate, and bear all risks associated with, the use of any Content, including any reliance on the accuracy, completeness, or usefulness of such Content. In this regard, you acknowledge that you may not rely on any Content created by the Service or submitted to the Service.

This leads me to believe that you can violate the Twitch TOS quoted above using FrostyTools. It is apparent that FrostyTools has positioned itself as an application that creates User Generated Content (like Photoshop or Word).

CrayonDevourer@lemmy.world · edit-2 19 days ago

You must only retain chat logs as long as necessary for the operation of Your Services or to improve Your Services

I’m not storing chat logs.

do not do so for the purpose of creating public databases or websites, or, in general, to collect information about Twitch’s end users.

Not creating any kind of public database either. It’s a private tool. Its purpose isn’t to massively-collect data about all of twitch either - it’s to provide reminders for social situations. If anything, it’s an accessibility tool for the disabled.

You must enable, and process, all requests by end users to block, discontinue, delete, or otherwise opt-out of any retention of chat logs for Your Services.

Again - Not storing chat logs. They are processed for information and that information inferred. I am storing reminders for the twitch streamer to talk about a certain subject at a certain time. If I put a reminder in my phone to remember to tell you happy birthday because I saw it on twitch; am I “creating a database of user information”? No. I’m creating a reminder for myself to remember to say happy birthday.

Having a computer help me remember those things isn’t a violation. Hell, even something like Microsoft’s new AI in windows does the same thing - are THEY violating twitch TOS when you have a browser window open? The answer is no.

When your streamer mentions something deeply personal, like, “how their mothers surgery went,” that your tool helped them remember, do they disclose that your tool was involved in that transaction?

No, nor should they be required to.

When the viewer gets weirded out and asks your streamer to not mention that again, or forget it entirely, do you have a way to remove that information from your database and a way to prove it’s been deleted? When other people in chat think it’s gross, and ask to opt-out, can you even do it?

When they mention not wanting to talk about something, that’s listed as something they don’t like to talk about, so in a way, yes.

Additionally, I instruct the ‘agent’ to disregard anything political or religious. - Though so far it’s not very good at distinguishing those things. Additionally it’s easy to feed it false information though it usually fixes it over time.

carl_dungeon@lemmy.world · 21 days ago

Most US states are single party consent. https://recordinglaw.com/united-states-recording-laws/one-party-consent-states/

interdimensionalmeme@lemmy.ml · 21 days ago

There is no expectation of privacy in public spaces. Participants to these streams which are open to all do not have a prohibition on repeating what they have heard.

kattfisk@lemmy.dbzer0.com · 21 days ago

Repeating what they heard is very different from automatically processing the chat to harvest personal information about the participants.

Just because some data is publicly available doesn’t mean all processing of that data is legal and moral.

interdimensionalmeme@lemmy.ml · 21 days ago

It is qualitatively equivalent. Any single piece of information could have been copied, it is safe to assume it has all been copied.

Although I would be onboard for supporting an expectation of pruvacy in public spaces and making private cctv recording illegal.

carl_dungeon@lemmy.world · 21 days ago

Right and what I was saying was even if it wasnt “public”, single party consent means the person recording can be that single party- so still a non-issue.

Hadowenkiroast@piefed.social · 21 days ago

sounds like salesforce for a twitch setting. cool use case, must make fun moments when he mentions such things.

jlow (he / him)@discuss.tchncs.de · 21 days ago

Esp. if the LLM just hallucinates 50% of the “facts” a about the users 👌

CrayonDevourer@lemmy.world · edit-2 21 days ago

That hasn’t been a problem at all for the 200+ users it’s tracking so far for about 4 months.

I don’t know a human that could ever keep up with this kind of thing. People just think he’s super personable, but in reality he’s not. He’s just got a really cool tool to use.

He’s managed some really good numbers because being that personal with people brings them back and keeps them chatting. He’ll be pushing for partner after streaming for only a year and he’s just some guy I found playing Wild Hearts with 0 viewers one day… :P

catty@lemmy.world · 21 days ago

Surely none of that uses a small LLM <= 3B?

CrayonDevourer@lemmy.world · edit-2 21 days ago

Yes. The small LLM isn’t retrieving data, it’s just understanding context of text enough to know what “Facts” need to be written to a file. I’m using the publicly released Deepseek models from a couple of months ago.

catty@lemmy.world · 21 days ago

Some questions and because you don’t actually understand, also, the answers.

what does the LLM understand the context of, (other user’s data owned by Twitch)
How is the LLM fed that data? (You store it and feed it to the LLM)
Do you use Twitch’s data and its users data through an AI without their consent? (Most likely, yes)
Do you have consent from the users to store ‘facts’ about them (You’re pissy, so obviously not)
Are you then storing that processed data? (Yes, you are, written to a file)
Is the purpose this data processing commercial (Yes, it is, designed to increase viewer count for the user of this system - and before you retort “OMG it helps twitch too”… Uhm no, Twitch has the viewers if not watching him, watching someone else)

I mean yeah, it’s a use case, but own up to the fact that you’re wrong. Or be pissy. I don’t care.

CrayonDevourer@lemmy.world · edit-2 20 days ago

So this wasn’t a post actually asking what a small LLM was good for, it was just an opportunity you could use to dump on LLM usage I take it. So this whole thing was made in bad faith?

With the comments about “vibe coding” and such, all it looks like you’re doing here is arguing the “merits” of how it’s being used, and you’re not interested in its actual usage at all.

Nobody is being pissy here except you. Small LLMs can be used for tasks such as this, and it doesn’t have to be twitch - It could be an assistant that you build for reminders in your personal life - using it on twitch is a minor detail that you seem to have latched onto because you just want to dump on LLM usage.

Go to /c/fuck_ai for that.

I gave you an example that it’s good for, and all you want to do is argue the merits of how I’m using it (even though it falls perfectly within Twitches TOS and use cases)

catty@lemmy.world · 20 days ago

You’re conflating me asking how to use these tools with you who’s misusing them. I see you still don’t accept what you’re doing is wrong. But go you.

ragingHungryPanda@lemmy.zip · 21 days ago

I’ve run a few models that I could on my GPU. I don’t think the smaller models are really good enough. They can do stuff, sure, but to get anything out of it, I think you need the larger models.

They can be used for basic things, though. There are coder specific models you can look at. Deepseek and qwen coder are some popular ones

scottrepreneur@lemmy.world · 21 days ago

Been coming to similar conclusions with some local adventures. It’s decent but not as able to process larger contexts.

catty@lemmy.world · 21 days ago

I haven’t actually found the coder-specific ones to be much (if at all) better than the generic ones. I wish I could have. Hopefully LLMs can become more efficient in the very near future.

hendrik@palaver.p3x.de · edit-2 21 days ago

I think that’s a size where it’s a bit more than a good autocomplete. Could be part of a chain for retrieval augmented generation. Maybe some specific tasks. And there are small machine learning models that can do translation or sentiment analysis, though I don’t think those are your regular LLM chatbots… And well, you can ask basic questions and write dialogue. Something like “What is an Alpaca?” will work. But they don’t have much knowledge under 8B parameters and they regularly struggle to apply their knowledge to a given task at smaller sizes. At least that’s my experience. They’ve become way better at smaller sizes during the last year or so. But they’re very limited.

I’m not sure what you intend to do. If you have some specific thing you’d like an LLM to do, you need to pick the correct one. If you don’t have any use-case… just run an arbitrary one and tinker around?