INDEX
Explanations
mentions of political figures and elections
structural elements of news articles or stories
New Auto-Interp
Negative Logits
Stage
-0.18
tremend
-0.16
Ye
-0.16
osate
-0.15
Py
-0.15
dyn
-0.15
Race
-0.15
Recall
-0.15
noble
-0.15
Dyn
-0.14
POSITIVE LOGITS
ONSORED
0.22
taxp
0.18
Reilly
0.18
soDeliveryDate
0.18
20439
0.18
outwe
0.17
ãĥ¼ãĥ³
0.17
uable
0.17
annie
0.17
understatement
0.17
Activations Density 12.671%