INDEX
    Explanations

    expressions related to punishment and aggression

    New Auto-Interp
    Negative Logits
    fel
    -0.15
    è²»
    -0.15
    ToStr
    -0.15
     Kens
    -0.14
    209
    -0.14
    411
    -0.14
    ochen
    -0.14
     Ladies
    -0.14
    yn
    -0.14
    lad
    -0.13
    POSITIVE LOGITS
    _operand
    0.18
    uger
    0.15
    üz
    0.15
    ìĺģ
    0.15
    rawer
    0.14
    ugas
    0.14
    jez
    0.14
     priest
    0.14
    odom
    0.13
    utenberg
    0.13
    Act Density 0.258%

    No Known Activations