INDEX
Explanations
proper nouns or names
Tokens before periods or special characters
specific suffixes and prefixes
New Auto-Interp
Negative Logits
\<^
-1.32
ratulations
-1.31
Roskov
-1.27
leſs
-1.26
>\<^
-1.26
itſelf
-1.24
^(@)
-1.24
intenance
-1.24
elfare
-1.22
litude
-1.21
POSITIVE LOGITS
k
0.88
us
0.85
il
0.81
.
0.80
es
0.80
ss
0.79
v
0.79
um
0.78
te
0.76
g
0.76
Activations Density 0.838%