INDEX
Explanations
punctuation marks, specifically commas and apostrophes
New Auto-Interp
Negative Logits
umas
-0.17
affen
-0.17
riad
-0.16
gili
-0.15
anoia
-0.15
ssa
-0.15
pmat
-0.14
STALL
-0.14
arest
-0.14
rosse
-0.14
POSITIVE LOGITS
ing
0.20
Grace
0.16
grace
0.16
sil
0.15
mods
0.14
Grace
0.14
arial
0.14
yper
0.14
_defs
0.14
silence
0.14
Activations Density 0.007%