INDEX
Explanations
names of specific individuals, potentially related to sports or political figures
proper nouns, particularly names of people
New Auto-Interp
Negative Logits
inel
-0.77
alogue
-0.77
isl
-0.76
oved
-0.76
than
-0.73
atan
-0.73
unden
-0.73
ern
-0.71
evil
-0.71
inet
-0.66
POSITIVE LOGITS
Fitzpatrick
0.93
patrick
0.85
Alley
0.82
hler
0.74
eru
0.73
Ingram
0.72
Pryor
0.72
Cummings
0.68
Dickinson
0.68
Shapiro
0.67
Activations Density 0.020%