INDEX
    Explanations

    subjective claims or opinions about people's behavior or situations

    New Auto-Interp
    Negative Logits
     scratch
    -0.16
    awah
    -0.15
    acomment
    -0.15
    yer
    -0.15
    ying
    -0.15
     Kurum
    -0.15
    ssa
    -0.14
    евиÑĩ
    -0.14
    ynos
    -0.14
    hud
    -0.14
    POSITIVE LOGITS
    ÙĨÛĮ
    0.15
     interven
    0.15
    tron
    0.15
     repl
    0.14
    zac
    0.14
    Crud
    0.14
    URLRequest
    0.14
    izo
    0.14
     zav
    0.13
    uke
    0.13
    Act Density 0.112%

    No Known Activations