INDEX
Explanations
mentions of violent actions
references to legal or ethical concerns related to actions and their consequences
New Auto-Interp
Negative Logits
quartered
-0.61
anan
-0.54
xtap
-0.53
doi
-0.51
largeDownload
-0.51
uscript
-0.49
antine
-0.47
heastern
-0.46
iple
-0.46
Gallup
-0.46
POSITIVE LOGITS
)).
0.95
?".
0.91
)),
0.83
?).
0.83
?",
0.82
)?
0.79
anymore
0.79
"))
0.74
))
0.74
?),
0.71
Activations Density 1.849%