INDEX
Explanations
words related to political matters
terms related to political issues and contexts
New Auto-Interp
Negative Logits
shall
-0.73
Lotus
-0.70
upon
-0.70
amination
-0.69
ERY
-0.68
plan
-0.67
Trinity
-0.66
wolves
-0.66
val
-0.66
Vikings
-0.66
POSITIVE LOGITS
speaking
0.84
correct
0.83
driven
0.82
challenging
0.79
motivated
0.78
challenged
0.78
handic
0.77
engineered
0.76
disag
0.76
opposed
0.75
Activations Density 0.010%