INDEX
Explanations
phrases indicating blame or accusation
New Auto-Interp
Negative Logits
LOCAL
-0.07
ANGES
-0.06
pellier
-0.06
ruk
-0.06
LOCAL
-0.06
лок
-0.06
hev
-0.06
roj
-0.06
DNA
-0.06
Local
-0.06
POSITIVE LOGITS
ugins
0.08
ugin
0.08
ington
0.07
dfa
0.07
olini
0.06
metis
0.06
Revolutionary
0.06
verb
0.06
æ°
0.06
NEC
0.06
Activations Density 0.000%