INDEX
    Explanations

    Negation and critical feedback

    New Auto-Interp
    Negative Logits
    -0.07
    orc
    -0.07
    -0.07
    obic
    -0.07
     validating
    -0.06
    \Persistence
    -0.06
     }↵↵↵
    -0.06
    -0.06
    kün
    -0.06
    ];//
    -0.06
    POSITIVE LOGITS
     pursuits
    0.06
    gs
    0.06
    	Block
    0.06
    .InvariantCulture
    0.06
     foil
    0.06
     Luo
    0.06
    ’ї
    0.06
     Pot
    0.06
     Nazis
    0.05
    ウォ
    0.05
    Act Density 0.162%

    No Known Activations