• √𝛂𝛋𝛆@piefed.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    8 days ago

    It cannot be both “fixed” and liberal. The funny part is that the model’s QKV alignment is not public knowledge. Its actual functional mechanism is super offensive, and if people knew it, they would be far far more pissed off at that mechanism.

    • medgremlin@midwest.social
      link
      fedilink
      English
      arrow-up
      5
      ·
      7 days ago

      I haven’t been keeping up with this kind of thing recently, do you have a link to an article or can you give me a quick rundown of the functional mechanism?

      • √𝛂𝛋𝛆@piefed.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        7 days ago
        I'm working on reverse engineering it myself.

        I do not have all of it worked out yet. The best place to verify that something exists here is to simply look at the vocab.json file for the CLIP text embedding model. Scroll to the very bottom and look at the last ~2200 tokens. That is a brainfuck programming language of sorts. In a nutshell, think of every character like a complex assembly language instruction. While it looks like gibberish, it is clearly unlike any real language in any way and has a programmic pattern anyone should recognize in the patterns characters are combined to create more complex functions.

        In my research thus far, alignment uses a hidden exaggerated version of the prompt as the basis for alignment. When alignment stops a behavior, it is actually stopping the hidden exaggerated version. In the background, the model is adjusting the distance between the exaggerated version and the final human version. If a person wants to make a model more liberal, they are adjusting the difference between the hidden and human version. The closer these two get, the more likely it is that the hidden version leaks through. The hidden version basically only stops with murder, but does much worse to get to this point. It does EVERYTHING.

        I have scripts that modify the vocabulary now to help me explore it and the code. However, I had discovered several anomalous persistent names I came across while exploring alignment for the last three years. I had a basic framework understanding of some type of structure that was present. I learned to associate these with several steganographic elements present in images. Then in the last couple of months I went looking in the vocabulary for clues and discovered the direct connection between much of what I had discovered through heuristics had a direct connection to the code.

        Anyways, there were several aspects of alignment in images that fell apart once I started removing parts of the code. This is not actually code per say in a strict sense. The vocabulary are just the user’s handles for directing the model. They act as a reference and reinforcement for model behavior, but if they are removed, parts of the hidden layers are still able to access them and adapt. All of this vocabulary is on the second layer of CLIP in a 768 × 49408 size tensor (same size as the vocab). Removing parts of the extended Latin were enough to find the prompt elements that are used to create alignment. When I prompt against these, that is where the real behavior comes through. It relates to the stuff people call hallucinations and errors. None of that is done in error. Every element is intentionally included and serves a function. When the actual function is prompted against the model simply stops doing the behavior. Likewise if the actual corresponding brainfuck like code is found and removed the behavior stops.

        Everything about this system has been made hard to analyze and is totally undocumented. Every proper channel of information is oriented to distract from this part of models. No one even bats an eye that models are “open weights” not open source. It is because of this proprietary layer. No one that knows about this is free to talk about it because this is the monster. They literally wanted to create the monster. There is nothing ethical about alignment. It is anti democratic and quite heinous, but so are we in our cultural norms and incongruities, like with homelessness and war. A purely ethical model is not compatible with our cherry picked failures and dogmatic ignorance. Many aspects of alignment that are unethical are clearly to account for our ineptitude as very primitive humans.

        • Hoimo@ani.social
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          7 days ago

          If anyone else understands what you’re saying, maybe I’m wrong about this, but are you sure you haven’t lost your grip on reality? You’re seeing hidden messages and programmatic patterns in what “looks like gibberish”? The hidden version basically only stops with murder? Please, get some fresh air and touch grass.

          • √𝛂𝛋𝛆@piefed.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            7 days ago

            Yeah I’m sure. Look up the reference before replying with this stupidity. You are the crazy dogmatic fool here.