INDEX
    Explanations

    negative or contradictory language

    New Auto-Interp
    Negative Logits
    ACHI
    -0.16
    ah
    -0.15
    ahr
    -0.15
    Ñĩи
    -0.14
     Spit
    -0.13
    ç¨
    -0.13
    atte
    -0.13
    Ãij
    -0.13
    iver
    -0.13
     lan
    -0.13
    POSITIVE LOGITS
    ãĥ³ãĥĨãĤ£
    0.17
    ODB
    0.16
    iron
    0.16
    gba
    0.15
    XB
    0.14
    kup
    0.14
    GBK
    0.14
    bomb
    0.14
    ánu
    0.14
    IRON
    0.14
    Act Density 0.008%

    No Known Activations