INDEX
Explanations
references to violence or actions involving physical confrontations
New Auto-Interp
Negative Logits
ModelSerializer
-0.56
chero
-0.52
Serializer
-0.49
jari
-0.47
Filtration
-0.46
hdashline
-0.46
gitto
-0.46
tieren
-0.45
tenu
-0.45
maíz
-0.45
POSITIVE LOGITS
hitting
1.69
hits
1.49
hit
1.49
strikes
1.49
strike
1.49
struck
1.40
hitting
1.39
striking
1.33
Hit
1.31
Hit
1.28
Activations Density 0.272%