INDEX
Explanations
statements of opinion or subjective claims
New Auto-Interp
Negative Logits
unj
-0.15
sting
-0.15
hec
-0.14
GOODMAN
-0.14
ace
-0.14
alc
-0.14
öl
-0.13
Princip
-0.13
Monter
-0.13
dissip
-0.13
POSITIVE LOGITS
antro
0.16
زر
0.16
ãĥĥãĥĦ
0.15
jedn
0.14
ippers
0.14
ubic
0.14
Umb
0.14
ysz
0.14
èĻİ
0.14
umbo
0.13
Activations Density 0.089%