INDEX
Explanations
mentions of illegal activities
terms related to illegal activities
New Auto-Interp
Negative Logits
enger
-0.81
pread
-0.81
oleon
-0.80
ivities
-0.79
rike
-0.75
etics
-0.75
phasis
-0.74
igating
-0.72
issance
-0.72
=-=-=-=-
-0.72
POSITIVE LOGITS
illegal
1.02
trafficking
0.92
downloading
0.86
immigrant
0.86
illegally
0.82
immigrants
0.81
aliens
0.81
alien
0.81
substances
0.78
unfocusedRange
0.77
Activations Density 0.026%