INDEX
Explanations
actions and physical interactions involving violence or aggression
striking or hitting
New Auto-Interp
Negative Logits
солю
-0.37
EnableWeb
-0.36
</tfoot>
-0.33
kece
-0.33
efectiva
-0.33
DockStyle
-0.32
白い
-0.31
conditions
-0.31
baik
-0.31
ned
-0.30
POSITIVE LOGITS
mallet
0.55
hammer
0.55
ProtoMessage
0.52
bờ
0.50
StructEnd
0.48
oneofs
0.48
hammers
0.47
Bewußt
0.47
EconPapers
0.45
viață
0.44
Activations Density 0.166%