INDEX
Explanations
references to significant events or public figures in political discourse
New Auto-Interp
Negative Logits
Genie
-0.61
broom
-0.59
successive
-0.56
chairs
-0.56
fundamentals
-0.55
paio
-0.55
sterile
-0.53
phia
-0.53
secretaries
-0.53
deviations
-0.52
POSITIVE LOGITS
th
0.96
2017
0.87
2016
0.79
2015
0.79
eteenth
0.77
Reply
0.77
2017
0.75
ember
0.74
2014
0.74
2016
0.73
Activations Density 0.024%