INDEX
Explanations
phrases related to political opinions and actions, specifically with a focus on specific political figures or movements
New Auto-Interp
Negative Logits
ibli
-0.68
fortunes
-0.62
ouf
-0.60
angu
-0.59
perty
-0.58
ounge
-0.56
Hist
-0.54
square
-0.53
MORE
-0.53
Shift
-0.53
POSITIVE LOGITS
by
1.03
expressly
0.84
during
0.84
jointly
0.82
collabor
0.81
pursuant
0.80
instituted
0.73
artificially
0.72
unanimously
0.72
aback
0.69
Activations Density 0.259%