INDEX
Explanations
references to violence and physical confrontations
New Auto-Interp
Negative Logits
للمعارف
-0.42
SpringBootTest
-0.41
Espèce
-0.40
kasarigan
-0.40
########.
-0.40
pleaños
-0.39
dstuk
-0.39
expandindo
-0.38
vician
-0.38
niczy
-0.37
POSITIVE LOGITS
brawl
0.61
fight
0.58
fight
0.57
unarmed
0.57
fist
0.55
fights
0.54
fights
0.53
altercation
0.53
fists
0.52
Fights
0.52
Activations Density 0.447%