INDEX
Explanations
terms related to threats and violence
references to threats and violent actions
New Auto-Interp
Negative Logits
Parables
-0.75
guiName
-0.72
eret
-0.70
quickShipAvailable
-0.67
æ©Ł
-0.63
Reviewer
-0.63
aepernick
-0.63
Costume
-0.63
ele
-0.63
Videos
-0.62
POSITIVE LOGITS
retaliation
1.07
retribution
1.05
repr
1.01
eviction
0.95
annihilation
0.93
jeopard
0.84
endanger
0.84
blackmail
0.83
wrath
0.82
extinction
0.79
Activations Density 0.125%