INDEX
Explanations
phrases describing violent actions or events
phrases indicating violence or attacks
New Auto-Interp
Negative Logits
soDeliveryDate
-0.83
ItemThumbnailImage
-0.71
SourceFile
-0.70
inventoryQuantity
-0.69
tions
-0.69
BuyableInstoreAndOnline
-0.68
swick
-0.67
paren
-0.67
aird
-0.66
PsyNet
-0.66
POSITIVE LOGITS
unsuspecting
1.17
erous
0.96
behalf
0.84
unarmed
0.79
hordes
0.78
astically
0.78
subordinates
0.77
indiscrim
0.76
unwitting
0.76
anyone
0.76
Activations Density 0.210%