INDEX
Explanations
actions related to violence and physical confrontations
New Auto-Interp
Negative Logits
兮
-0.56
spesa
-0.54
experiment
-0.53
C
-0.52
urity
-0.52
Sche
-0.51
Flux
-0.51
alej
-0.49
Flux
-0.49
Bill
-0.49
POSITIVE LOGITS
hitting
1.35
strike
1.31
hit
1.30
strikes
1.28
Schlag
1.25
punch
1.25
hammer
1.25
punches
1.24
blows
1.22
hits
1.21
Activations Density 0.266%