INDEX
Explanations
discussion of policies, political figures, and public affairs
New Auto-Interp
Negative Logits
hoe
-0.75
owered
-0.74
culus
-0.71
rug
-0.70
Armored
-0.70
itching
-0.69
cigarettes
-0.69
bees
-0.68
bowling
-0.64
ynthesis
-0.64
POSITIVE LOGITS
deserves
1.06
deserve
0.99
rightly
0.90
rightfully
0.90
deserved
0.88
therefore
0.88
respected
0.79
vou
0.78
justified
0.78
justifies
0.78
Activations Density 0.210%