• 0 Posts
  • 62 Comments
Joined 1 year ago
cake
Cake day: October 6th, 2024

help-circle









  • What you are suggesting is something called an “editor phase”, some COT (chain of thought, commonly known as “thinking” models) do this to an extent. It’s also something that can be done via JavaScript in the current AAC right now. To do this in AAC, you can either fork the current chat and make your own changes, or you can leverage the JavaScript function on characters to have the AI respond twice. Once to perform a standard completion, and the next, passing the entire chat again, this time including the last response along with instructions on what to “edit”. The AI makes a new response, and you replace the last response with the modified one. AI isn’t actually capable of self-analysis or of thinking about what its going to do as its responding, so to help mitigate this issue, you must break things into steps.


  • AI is stateless, which means it’s seeing everything for the first time every time it responds. Writing instructions, characters, lore, reminders, your response, chat history, all of it is just sent as a big block of text with little header text for each section. The AI then has to parse through all of it, try to make sense of it, and then come up with a response to send back to you. What you’re asking for is just another block of text to send along with everything else. Really no different than a reminder.

    This is a simple limitation of the current AI model and it will improve (probably by a fairly good amount) with the text upgrade, but it still won’t be perfect because AI doesn’t actually understand.

    You’re providing the AI with information that sounds like China, which means the AI is going to look at the data it, see stuff that sounds like China, and that stuff is related to China, so it’s going to talk about China.







  • You clearly don’t know, otherwise you wouldn’t be saying that Llama 3 has a bigger token count. The two are not related. Token count is directly related to how much VRAM you through at the model (barring special exceptions like Gemini 2.5 and some special builds using ROPE to extend context).

    I also took a look at the dev’s post history. At no point do they make mention of what model they plan to use. The only references to Llama are old mentions of the reddit channel “Local Llama” and stating that the current model is a popular 70b variant. That’s it.