INDEX
Explanations
names of famous personalities
mentions of specific individuals' names
New Auto-Interp
Negative Logits
mine
-0.97
aic
-0.93
osition
-0.88
icular
-0.85
lisher
-0.85
joined
-0.84
ri
-0.84
rity
-0.84
minist
-0.83
iated
-0.82
POSITIVE LOGITS
Wallace
0.90
Hayes
0.77
Ellis
0.76
Stevens
0.75
Strait
0.72
Williams
0.69
Hole
0.69
Owens
0.66
Waters
0.64
Tyson
0.64
Activations Density 0.109%