INDEX
Explanations
words related to physical attack or aggression
references to "mugging" or the word "mug" in various contexts
New Auto-Interp
Negative Logits
ISION
-0.82
cision
-0.69
Virgin
-0.69
×Ļ×
-0.66
ipher
-0.66
IGH
-0.65
Domin
-0.65
Physicians
-0.64
SOURCE
-0.63
Empires
-0.63
POSITIVE LOGITS
gers
1.16
shots
1.08
ging
1.07
ger
1.04
itude
1.01
shot
1.00
atu
0.93
ged
0.93
gery
0.93
ifully
0.92
Activations Density 0.009%