INDEX
Explanations
phrases related to political opinions and policy recommendations
New Auto-Interp
Negative Logits
fortune
-0.74
76561
-0.70
Crazy
-0.64
whirlwind
-0.62
saw
-0.61
unlucky
-0.60
Guess
-0.59
Mini
-0.58
Adding
-0.58
culus
-0.57
POSITIVE LOGITS
be
1.32
prevail
1.18
encompass
1.13
remain
1.12
belong
1.12
consist
1.06
occur
1.06
preclude
1.04
reflect
1.04
preced
1.03
Activations Density 0.176%