INDEX
Explanations
words related to acts of aggression or violence
references to attacks, particularly in a violent or aggressive context
New Auto-Interp
Negative Logits
zl
-0.65
tz
-0.62
ETA
-0.61
atom
-0.60
Marketable
-0.60
Vert
-0.59
Supplementary
-0.58
Zip
-0.58
theless
-0.57
glomer
-0.57
POSITIVE LOGITS
attack
1.16
attack
1.09
attacks
0.96
Attack
0.92
spree
0.91
oise
0.87
attackers
0.86
attacks
0.86
ocalypse
0.83
CVE
0.81
Activations Density 0.031%