INDEX
Explanations
expressions of dissatisfaction and criticism regarding behavior and ethics
New Auto-Interp
Negative Logits
cum
-0.16
.swing
-0.15
neutral
-0.15
-neutral
-0.15
985
-0.15
ys
-0.15
umd
-0.14
ãĤ«ãĥ«
-0.14
bara
-0.14
deck
-0.14
POSITIVE LOGITS
wake
0.16
λαν
0.15
otre
0.15
slee
0.14
Band
0.14
ÐĴÐŀ
0.13
(compact
0.13
ÙİÙĤ
0.13
ATYPE
0.13
abbo
0.13
Activations Density 0.288%