INDEX
Explanations
references to human trafficking and slavery
New Auto-Interp
Negative Logits
Brunswick
-0.16
ç«ĭãģ¦
-0.16
archy
-0.15
reich
-0.15
aday
-0.14
ouden
-0.14
propagate
-0.14
isure
-0.14
stab
-0.14
ocl
-0.14
POSITIVE LOGITS
Traff
0.27
traff
0.24
chatt
0.23
trafficking
0.23
exploitation
0.19
human
0.19
child
0.18
slavery
0.18
奴
0.17
_tra
0.16
Activations Density 0.094%