INDEX
Explanations
violent actions involving physical contact or objects
words and phrases related to physical violence and objects associated with it
New Auto-Interp
Negative Logits
rosso
-0.82
qualitative
-0.74
specificity
-0.73
autonomy
-0.67
KNOWN
-0.66
Alam
-0.64
Specific
-0.64
personalities
-0.63
conversions
-0.63
ITNESS
-0.62
POSITIVE LOGITS
ledge
1.10
fence
0.96
mattress
0.95
bushes
0.91
pedest
0.91
sofa
0.88
railing
0.88
andel
0.88
stret
0.87
curtains
0.87
Activations Density 0.299%