INDEX
    Explanations

    phrases related to legal violations or breaches of regulations

    New Auto-Interp
    Negative Logits
    bane
    -0.18
    ÑģÑĤÑİ
    -0.18
    stamp
    -0.17
    iras
    -0.16
    terra
    -0.16
    nia
    -0.15
    bay
    -0.15
    emann
    -0.14
    óln
    -0.14
    æŀĿ
    -0.14
    POSITIVE LOGITS
    chk
    0.16
    -hooks
    0.16
    ĵn
    0.15
    ensen
    0.15
    utherford
    0.14
    188
    0.14
    ibold
    0.14
    ÃŃt
    0.13
    erce
    0.13
    mainwindow
    0.13
    Act Density 0.032%

    No Known Activations