INDEX
Explanations
actions involving physical violence
New Auto-Interp
Negative Logits
lio
-0.86
ublic
-0.76
ablishment
-0.75
iors
-0.71
Angelo
-0.71
UX
-0.68
ylum
-0.67
bh
-0.67
greets
-0.67
Iss
-0.65
POSITIVE LOGITS
fingert
1.18
scissors
1.16
fists
1.06
fingers
1.05
finger
1.03
knife
1.03
pencil
1.01
claws
1.00
clenched
0.98
fingertips
0.97
Activations Density 0.154%