INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
abra
-0.15
agini
-0.15
itizen
-0.15
uong
-0.14
_utilities
-0.14
ë¥
-0.14
ritch
-0.14
rame
-0.14
inki
-0.14
Metallic
-0.14
POSITIVE LOGITS
ibus
0.16
shadow
0.14
eck
0.14
Arms
0.14
sville
0.14
arpa
0.14
é¦
0.13
145
0.13
oz
0.13
ussy
0.13
Activations Density 0.040%