INDEX
    Explanations

    phrases expressing threats or harmful intentions

    New Auto-Interp
    Negative Logits
    akan
    -0.16
    OLUMNS
    -0.15
    reau
    -0.15
    asi
    -0.15
    etic
    -0.15
    amel
    -0.14
     __("
    -0.14
    -END
    -0.14
    overrides
    -0.14
    _featured
    -0.14
    POSITIVE LOGITS
    arih
    0.17
    orgh
    0.16
    OrNull
    0.16
    krom
    0.16
    ôi
    0.15
    engo
    0.15
    CTest
    0.14
    urum
    0.14
    ovit
    0.14
     Gulf
    0.14
    Act Density 0.334%

    No Known Activations