INDEX
Explanations
words related to political figures or entities
proper nouns associated with notable entities or groups
New Auto-Interp
Negative Logits
stration
-0.71
legram
-0.68
aunder
-0.67
amina
-0.67
hess
-0.67
ancial
-0.67
ript
-0.67
enda
-0.66
vironment
-0.66
chet
-0.66
POSITIVE LOGITS
deems
0.79
deem
0.70
Enterprises
0.68
chose
0.67
chooses
0.66
stole
0.66
describes
0.64
assigns
0.64
fans
0.63
refers
0.63
Activations Density 0.186%