INDEX
    Explanations

    words and phrases related to provocation and provocative speech

    New Auto-Interp
    Negative Logits
    utex
    -0.14
    mad
    -0.14
    erez
    -0.14
    ern
    -0.14
     Rip
    -0.14
    optera
    -0.14
    ule
    -0.13
    絡
    -0.13
    ensch
    -0.13
    ä
    -0.13
    POSITIVE LOGITS
    žen
    0.18
    /assert
    0.17
    eyin
    0.16
    CHASE
    0.16
    Äįin
    0.15
    zÄħd
    0.15
    lint
    0.14
    nutÃŃm
    0.14
    ãĥ©ãĥĥãĤ¯
    0.14
    nÃŃk
    0.14
    Act Density 0.006%

    No Known Activations