INDEX
Explanations
statements and discussions about political positions and policies related to social issues
New Auto-Interp
Negative Logits
Roose
-0.15
ipeg
-0.14
OSC
-0.14
à¤Ŀ
-0.13
Roosevelt
-0.13
ÑĢог
-0.13
ĵåIJį
-0.13
bÄĽ
-0.13
_mux
-0.13
gren
-0.13
POSITIVE LOGITS
hta
0.18
ãĤ¤ãĥ«
0.15
iture
0.15
Aligned
0.15
беÑĤ
0.14
sop
0.14
herits
0.14
aye
0.13
arel
0.13
ukkan
0.13
Activations Density 0.126%