INDEX
Explanations
mentions or discussions of political moderation
New Auto-Interp
Negative Logits
arium
-0.73
chy
-0.71
stals
-0.70
atography
-0.69
sten
-0.68
raltar
-0.68
kefeller
-0.68
borne
-0.67
OGR
-0.66
Report
-0.65
POSITIVE LOGITS
erate
1.07
sized
0.99
minded
0.97
xual
0.93
(<
0.80
lees
0.70
leaning
0.69
(~
0.69
medi
0.69
moderate
0.68
Activations Density 0.032%