INDEX
Explanations
punctuation marks and expressions of emotion or tone
New Auto-Interp
Negative Logits
ceae
-0.16
mor
-0.16
cons
-0.15
tra
-0.14
è§
-0.14
twig
-0.14
oÄį
-0.14
hung
-0.14
roids
-0.14
&_
-0.14
POSITIVE LOGITS
stan
0.15
živ
0.14
Yao
0.14
artin
0.14
avis
0.14
enson
0.14
lÃŃ
0.14
_DM
0.14
semb
0.14
.IGNORE
0.14
Activations Density 0.184%