INDEX
Explanations
references to political figures and their actions
New Auto-Interp
Negative Logits
senator
-0.18
ifik
-0.15
Senator
-0.15
гÑĥбеÑĢ
-0.15
Senator
-0.15
gra
-0.15
iku
-0.14
bish
-0.14
senators
-0.14
Senators
-0.14
POSITIVE LOGITS
Minority
0.38
Speaker
0.37
speaker
0.33
minority
0.33
Speaker
0.32
Majority
0.32
Leader
0.32
Speakers
0.31
majority
0.29
Whip
0.29
Activations Density 0.029%