INDEX
Explanations
instances of illegal or prohibited activities
references to illegal activities
New Auto-Interp
Negative Logits
nil
-0.76
achine
-0.75
erer
-0.73
ivities
-0.72
oras
-0.72
vation
-0.69
oran
-0.69
oleon
-0.69
anche
-0.69
enthus
-0.68
POSITIVE LOGITS
illegally
1.15
downloading
0.84
downloaded
0.81
copied
0.77
detained
0.77
infringing
0.74
unlawfully
0.72
obtained
0.71
accessing
0.70
imported
0.70
Activations Density 0.007%