INDEX
Explanations
special characters or unusual diacritics in text
New Auto-Interp
Negative Logits
ware
-0.17
asca
-0.16
arsi
-0.16
com
-0.15
ous
-0.15
о
-0.15
orch
-0.15
fy
-0.15
ooks
-0.14
aka
-0.14
POSITIVE LOGITS
stakes
0.15
lobal
0.15
elian
0.15
cker
0.14
zzo
0.14
dig
0.14
fulness
0.14
quiv
0.14
llib
0.14
ĵ
0.14
Activations Density 0.075%