INDEX
Explanations
names related to individuals
proper nouns, specifically names related to individuals and locations
New Auto-Interp
Negative Logits
ffe
-0.75
rise
-0.71
road
-0.70
earch
-0.68
bel
-0.67
meal
-0.66
ugh
-0.66
bell
-0.64
phrine
-0.64
dress
-0.64
POSITIVE LOGITS
ique
1.18
icans
1.05
ance
1.04
ANCE
0.98
ick
0.98
ator
0.96
icks
0.94
icum
0.94
antly
0.88
ica
0.87
Activations Density 0.051%