INDEX
Explanations
phrases related to illegal activities, specifically trafficking
terms related to human and illegal trafficking
New Auto-Interp
Negative Logits
thus
-0.71
eely
-0.69
¯¯
-0.69
Fitness
-0.68
semble
-0.67
++++++++++++++++
-0.66
Metatron
-0.66
Pyr
-0.66
Einstein
-0.65
tes
-0.65
POSITIVE LOGITS
trafficking
1.49
Traff
1.15
traffickers
0.99
traff
0.94
proble
0.84
wana
0.82
offenders
0.82
offenses
0.82
exploited
0.80
smuggling
0.79
Activations Density 0.009%