INDEX
Explanations
names or names that include specific suffixes, particularly related to people
New Auto-Interp
Negative Logits
etine
-0.17
olec
-0.16
åĴ²
-0.14
ngu
-0.14
SSI
-0.14
bero
-0.14
hetto
-0.14
à¹ĥà¸ļ
-0.14
outr
-0.14
hand
-0.14
POSITIVE LOGITS
usan
0.20
èĻ
0.16
unch
0.16
usz
0.15
867
0.15
porte
0.15
oose
0.14
uri
0.14
odom
0.14
èĬĻ
0.14
Activations Density 0.003%