INDEX
Explanations
words related to policymakers and decision-makers
references to policymakers and related roles in various contexts
New Auto-Interp
Negative Logits
tein
-0.72
TPS
-0.70
bush
-0.68
Liberty
-0.67
Aw
-0.67
Valkyrie
-0.66
jri
-0.66
Revival
-0.66
Bloody
-0.61
Redemption
-0.61
POSITIVE LOGITS
alike
1.17
everywhere
0.99
strive
0.95
beware
0.89
recognize
0.87
prescribe
0.86
wishing
0.85
aspire
0.84
perceive
0.83
hip
0.83
Activations Density 0.267%