INDEX
Explanations
words or suffixes related to names and titles
New Auto-Interp
Negative Logits
nd
-0.26
line
-0.21
rou
-0.20
nds
-0.20
st
-0.19
ness
-0.19
nya
-0.18
shan
-0.18
na
-0.18
nde
-0.17
POSITIVE LOGITS
utenant
0.24
ght
0.24
lectric
0.23
=edge
0.21
vements
0.20
=UTF
0.20
zsche
0.19
gos
0.19
gh
0.18
ÃŁen
0.18
Activations Density 0.077%