INDEX
    Explanations

    expressions indicating dishonesty or manipulation in contexts of work or societal issues

    New Auto-Interp
    Negative Logits
    ikk
    -0.15
    enville
    -0.14
    eer
    -0.14
    uki
    -0.14
    ycastle
    -0.14
    loon
    -0.14
    byn
    -0.14
    ازÙĦ
    -0.13
    uku
    -0.13
     (*(
    -0.13
    POSITIVE LOGITS
     Vice
    0.18
    vice
    0.17
     vice
    0.17
    alem
    0.16
    hait
    0.15
     Secondary
    0.14
    hof
    0.14
    Ïģγ
    0.14
     Representative
    0.14
    .setViewportView
    0.14
    Act Density 0.083%

    No Known Activations