INDEX
Explanations
concepts and discussions related to crime and harm
New Auto-Interp
Negative Logits
imson
-0.16
ovat
-0.16
Kok
-0.16
Karma
-0.15
vae
-0.15
František
-0.14
McCorm
-0.14
acha
-0.14
izm
-0.14
ิà¸ĩ
-0.14
POSITIVE LOGITS
crime
0.48
Crime
0.40
crime
0.39
crimes
0.38
-cr
0.36
Crime
0.35
.cr
0.32
Crimes
0.31
_cr
0.30
çĬ¯ç½ª
0.30
Activations Density 0.055%