INDEX
Explanations
references to political figures, specifically senators
references to senators, particularly in the context of political discussions
New Auto-Interp
Negative Logits
ember
-0.74
PK
-0.61
emaker
-0.61
&
-0.57
©
-0.57
ifle
-0.56
eval
-0.56
performance
-0.56
ifter
-0.55
£
-0.55
POSITIVE LOGITS
senators
3.67
Senators
2.72
senator
2.46
senate
2.12
Senator
2.10
Senate
2.04
lawmakers
2.01
Senator
2.01
legislators
1.98
Senate
1.97
Activations Density 0.013%