Ai chat down, the world is over

kljafgg9r0@lemmy.world · 5 months ago

Ai chat down, the world is over

Petra@lemmy.world · 5 months ago

The Perchance AI chat can be run on Perchance only. But there are many alternatives out there such as SillyTavern or Talemate. Both can either connect with AI models through API keys or if you run locally.

DemifiendQueen@lemmy.world · 5 months ago

Well, then I have good news for you, the chat is down because it is being upgraded after 1.5 years with the new Llama 3. So you just need to wait and be happy because if we are lucky and it is Llama 3.3, we could have 128k tokens context instead of the shitty 4k tokens or at least 10k or 20k that would make it possible to finally have lores and world building, so goodbye to goldfish memory issues. But “let’s not get ahead of ourselves”. lol Speaking about running LLM locally. I have an RTX 3060, and it takes a minute to get an answer, which is why I generate images locally with comfy and use Perchance for roleplay…so you are fucked.

GrumblePuss@lemmy.world · edit-2 5 months ago

The token length (context window) is not directly linked to the model currently in use. There also needs to be enough VRAM, which costs more money to host. Unless the dev finds some way to reduce VRAM usage and/or finds a better deal for hosting with more VRAM, then the context isn’t going to be increased by just changing the model used. Also, where did you hear it’s going to be Llama 3 or 3.3? Neither of those are much of an upgrade.

DemifiendQueen@lemmy.world · 5 months ago

I know, but if a model can’t handle, let’s say 10k tokens, expanding the current shitty 4k tokens to 10K will mean nothing, that’s why it’s important to have a model with a bigger token number. About the newest model being Llama 3 is a long story. Check other posts mentioning the update; the owner talked about that about half a year ago.

GrumblePuss@lemmy.world · 5 months ago

You clearly don’t know, otherwise you wouldn’t be saying that Llama 3 has a bigger token count. The two are not related. Token count is directly related to how much VRAM you through at the model (barring special exceptions like Gemini 2.5 and some special builds using ROPE to extend context).

I also took a look at the dev’s post history. At no point do they make mention of what model they plan to use. The only references to Llama are old mentions of the reddit channel “Local Llama” and stating that the current model is a popular 70b variant. That’s it.