INDEX
    Explanations

    instances of the word "abuse" and related terms

    New Auto-Interp
    Negative Logits
    æı
    -0.15
    оби
    -0.14
    uture
    -0.14
    iliz
    -0.14
    ãĤ·ãĤ¢
    -0.13
    تاÙĨ
    -0.13
    ari
    -0.13
    stad
    -0.13
    anders
    -0.13
    ling
    -0.13
    POSITIVE LOGITS
    amac
    0.18
    fully
    0.16
     ÙħÙĤد
    0.14
    ena
    0.13
    antly
    0.13
    733
    0.13
    ASON
    0.13
     Ansi
    0.13
    ongyang
    0.13
    builtin
    0.13
    Act Density 0.011%

    No Known Activations