INDEX
Explanations
violent actions involving physical contact
instances of violent actions or physical attacks
New Auto-Interp
Negative Logits
sylvania
-0.70
ilight
-0.66
Reconstruction
-0.64
venant
-0.63
obook
-0.62
omsky
-0.61
etts
-0.60
ieri
-0.59
phase
-0.58
achusetts
-0.58
POSITIVE LOGITS
stood
1.36
standing
1.23
regard
1.14
impunity
1.09
regards
1.09
draw
1.07
holding
1.01
drawn
0.94
respect
0.91
dignity
0.90
Activations Density 0.195%