INDEX
Explanations
negative or critical language used in a political context
New Auto-Interp
Negative Logits
emis
-0.21
gypt
-0.21
roo
-0.20
":[
-0.19
hops
-0.19
ringe
-0.18
Sharing
-0.18
gnu
-0.18
transporting
-0.17
slave
-0.17
POSITIVE LOGITS
revelations
0.27
rebuke
0.26
ly
0.25
blow
0.24
indictment
0.23
revelation
0.23
condemnation
0.23
accusation
0.22
headlines
0.22
accusations
0.22
Activations Density 11.208%