INDEX
Explanations
expressions indicating dishonesty or manipulation in contexts of work or societal issues
New Auto-Interp
Negative Logits
ikk
-0.15
enville
-0.14
eer
-0.14
uki
-0.14
ycastle
-0.14
loon
-0.14
byn
-0.14
ازÙĦ
-0.13
uku
-0.13
(*(
-0.13
POSITIVE LOGITS
Vice
0.18
vice
0.17
vice
0.17
alem
0.16
hait
0.15
Secondary
0.14
hof
0.14
Ïģγ
0.14
Representative
0.14
.setViewportView
0.14
Activations Density 0.083%