INDEX
Explanations
phrases indicating disagreement or opposition to certain beliefs or actions
New Auto-Interp
Negative Logits
o
-0.15
æº
-0.15
æĤ
-0.15
.registry
-0.14
µ
-0.14
κο
-0.14
roperty
-0.14
hausen
-0.14
dex
-0.14
hail
-0.14
POSITIVE LOGITS
Argb
0.20
icient
0.19
elan
0.18
acket
0.17
žal
0.16
pii
0.16
ISR
0.15
495
0.15
akis
0.15
ackets
0.15
Activations Density 0.078%