INDEX
Explanations
punctuation marks and their frequency
New Auto-Interp
Negative Logits
gh
-0.14
abei
-0.14
ola
-0.13
bourg
-0.13
rlen
-0.13
ropp
-0.13
din
-0.13
æĬŀ
-0.13
velt
-0.12
.which
-0.12
POSITIVE LOGITS
etc
0.26
etc
0.22
iban
0.20
gone
0.16
dig
0.15
Ä©
0.14
akedown
0.14
undry
0.14
kas
0.14
.problem
0.13
Activations Density 0.290%