INDEX
Explanations
proper nouns related to individuals, particularly ones named Alexander
New Auto-Interp
Negative Logits
neys
-0.92
zee
-0.88
kered
-0.83
eling
-0.82
rosse
-0.80
atical
-0.78
employment
-0.77
eless
-0.77
ths
-0.75
elling
-0.74
POSITIVE LOGITS
Gust
0.94
Wang
0.83
Luthor
0.81
Cock
0.80
opoulos
0.77
Hamilton
0.75
Payne
0.74
Calder
0.71
Anton
0.71
Graham
0.70
Activations Density 0.014%