INDEX
Explanations
names of political figures
New Auto-Interp
Negative Logits
bed
-0.73
bing
-0.71
stress
-0.66
cki
-0.65
Cascade
-0.63
guided
-0.63
PRESS
-0.60
unity
-0.59
ending
-0.58
ffer
-0.57
POSITIVE LOGITS
iage
1.00
ials
0.97
ians
0.95
iments
0.94
igans
0.94
iants
0.92
iating
0.92
ial
0.91
iane
0.91
teenth
0.89
Activations Density 0.183%