INDEX
Explanations
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
ató
-0.17
shin
-0.16
rou
-0.15
athon
-0.15
FINITY
-0.15
ãĤ¹ãĤ«
-0.15
Jaune
-0.15
Nİ
-0.14
::|
-0.14
#
-0.14
POSITIVE LOGITS
Mens
0.31
Boat
0.28
pong
0.24
Dark
0.23
Amp
0.22
Yaw
0.21
Kw
0.21
gy
0.21
Ow
0.21
Dark
0.21
Activations Density 0.021%