INDEX
Explanations
references to acts of crime or wrongdoing
New Auto-Interp
Negative Logits
Superior
-0.15
attacker
-0.15
offending
-0.15
thern
-0.14
illa
-0.14
attackers
-0.13
ilo
-0.13
_compress
-0.13
possibility
-0.13
/ay
-0.13
POSITIVE LOGITS
acts
0.23
acts
0.22
suicide
0.21
treason
0.19
genocide
0.19
ocide
0.18
sudoku
0.17
adultery
0.17
burglary
0.17
адÑĥ
0.17
Activations Density 0.021%