INDEX
Explanations
threatening actions or statements
instances of threats or aggressive intimidation
New Auto-Interp
Negative Logits
ortunate
-0.94
erest
-0.92
eret
-0.79
uitive
-0.77
correctly
-0.73
imum
-0.71
ortun
-0.71
Subtle
-0.71
appreciated
-0.70
natureconservancy
-0.70
POSITIVE LOGITS
confisc
1.44
blackmail
1.28
boycot
1.27
boycott
1.17
vandal
1.14
confiscated
1.09
sabotage
1.09
suing
1.09
extortion
1.08
retali
1.07
Activations Density 0.851%