INDEX
Explanations
mentions of political front-runners
repeated references to political candidates or front-runners in elections
New Auto-Interp
Negative Logits
Reward
-0.61
Curve
-0.61
Definitions
-0.61
Kard
-0.59
Crime
-0.57
Reincarn
-0.57
Mean
-0.56
FORE
-0.56
nia
-0.55
ution
-0.55
POSITIVE LOGITS
runners
1.41
runner
1.16
iers
1.11
page
1.00
loading
0.96
office
0.95
ben
0.95
bench
0.94
liner
0.94
liners
0.91
Activations Density 0.036%