I’m sure it will be fixed soon, but I just realized how much I am obsessed with ai chat. I fear a day where the site will be taken down or way or another. Is there any way to use the chat bot, preferrably with the same interface/able to load json files, to our computer? Would a gtx 1060 be able to run it?
The Perchance AI chat can be run on Perchance only. But there are many alternatives out there such as SillyTavern or Talemate. Both can either connect with AI models through API keys or if you run locally.
Well, then I have good news for you, the chat is down because it is being upgraded after 1.5 years with the new Llama 3. So you just need to wait and be happy because if we are lucky and it is Llama 3.3, we could have 128k tokens context instead of the shitty 4k tokens or at least 10k or 20k that would make it possible to finally have lores and world building, so goodbye to goldfish memory issues. But “let’s not get ahead of ourselves”. lol Speaking about running LLM locally. I have an RTX 3060, and it takes a minute to get an answer, which is why I generate images locally with comfy and use Perchance for roleplay…so you are fucked.
The token length (context window) is not directly linked to the model currently in use. There also needs to be enough VRAM, which costs more money to host. Unless the dev finds some way to reduce VRAM usage and/or finds a better deal for hosting with more VRAM, then the context isn’t going to be increased by just changing the model used. Also, where did you hear it’s going to be Llama 3 or 3.3? Neither of those are much of an upgrade.
I know, but if a model can’t handle, let’s say 10k tokens, expanding the current shitty 4k tokens to 10K will mean nothing, that’s why it’s important to have a model with a bigger token number. About the newest model being Llama 3 is a long story. Check other posts mentioning the update; the owner talked about that about half a year ago.
You clearly don’t know, otherwise you wouldn’t be saying that Llama 3 has a bigger token count. The two are not related. Token count is directly related to how much VRAM you through at the model (barring special exceptions like Gemini 2.5 and some special builds using ROPE to extend context).
I also took a look at the dev’s post history. At no point do they make mention of what model they plan to use. The only references to Llama are old mentions of the reddit channel “Local Llama” and stating that the current model is a popular 70b variant. That’s it.


