INDEX
Explanations
proper nouns, specifically names of individuals
proper nouns, primarily names of people
New Auto-Interp
Negative Logits
exting
-0.91
Þ
-0.89
pione
-0.84
theless
-0.82
ModLoader
-0.82
referen
-0.82
ccording
-0.81
awa
-0.77
eleph
-0.77
ò
-0.76
POSITIVE LOGITS
zinski
0.96
uez
0.93
bard
0.93
eson
0.92
rower
0.88
rigan
0.88
aney
0.86
riott
0.85
ovich
0.85
pson
0.85
Activations Density 0.089%