INDEX
    Explanations

    vulgar and offensive language

    New Auto-Interp
    Negative Logits
    kinson
    -0.15
     Opport
    -0.14
    ULSE
    -0.14
    ecided
    -0.14
    arken
    -0.13
    yang
    -0.13
    maal
    -0.13
    yre
    -0.13
    озд
    -0.13
    stile
    -0.13
    POSITIVE LOGITS
    Wheel
    0.16
    CommandEvent
    0.15
    untu
    0.15
    üzel
    0.15
    é϶
    0.14
     ordin
    0.14
    Permanent
    0.14
    BUM
    0.14
     [&
    0.13
    acen
    0.13
    Act Density 0.094%

    No Known Activations