INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    \b
    -0.07
     är
    -0.06
     heg
    -0.06
     sto
    -0.06
     confidence
    -0.06
    .argument
    -0.06
     thinks
    -0.06
     laat
    -0.06
     Noir
    -0.06
     humour
    -0.06
    POSITIVE LOGITS
     Awake
    0.07
     日本
    0.07
     awake
    0.06
     waking
    0.06
     Concepts
    0.06
     awakening
    0.06
     awakened
    0.06
    ));↵↵↵
    0.06
    GROUND
    0.06
    updates
    0.06
    Act Density 0.025%

    No Known Activations