INDEX
Explanations
occurrences of violence and injury-related terms
New Auto-Interp
Negative Logits
uien
-0.16
hape
-0.15
Sne
-0.14
itude
-0.14
idget
-0.14
azen
-0.14
retch
-0.14
IDGE
-0.14
uner
-0.14
inspace
-0.14
POSITIVE LOGITS
èĿ
0.14
plevel
0.14
queryInterface
0.13
amba
0.13
547
0.13
okol
0.13
ảnh
0.13
rung
0.13
pend
0.13
celik
0.13
Activations Density 0.051%