INDEX
Explanations
terms related to political events and controversies
occurrences of sensitive political information and events
New Auto-Interp
Negative Logits
gren
-0.66
course
-0.66
boro
-0.65
cius
-0.65
gements
-0.64
lihood
-0.64
Cat
-0.62
utical
-0.62
manoeuv
-0.61
iates
-0.60
POSITIVE LOGITS
³³³
0.74
arta
0.71
³³³³³³³³
0.70
Specifically
0.69
³³³³
0.65
³³³³³³³³³³³³³³³³
0.65
WASHINGTON
0.64
Writing
0.63
WARN
0.63
REUTERS
0.61
Activations Density 0.331%