INDEX
Explanations
punctuation marks and sentence endings
New Auto-Interp
Negative Logits
regon
-0.07
akk
-0.07
woff
-0.07
-transitional
-0.06
owell
-0.06
oice
-0.06
orris
-0.06
wolf
-0.06
cis
-0.06
kaar
-0.06
POSITIVE LOGITS
679
0.06
739
0.06
ring
0.06
715
0.06
-under
0.05
Amb
0.05
urate
0.05
ãĥ©ãĥ³
0.05
emp
0.05
_under
0.05
Activations Density 0.000%