INDEX
Explanations
references to "ICE" (Immigration and Customs Enforcement) - likely related to immigration enforcement activities
references to the Immigration and Customs Enforcement (ICE) agency
New Auto-Interp
Negative Logits
orship
-0.91
gradient
-0.86
angers
-0.84
educ
-0.78
andals
-0.76
stable
-0.76
orage
-0.74
kers
-0.73
authent
-0.72
agi
-0.72
POSITIVE LOGITS
ICE
1.27
SEA
0.83
BOX
0.82
IRO
0.78
ICE
0.78
HAEL
0.77
Exit
0.77
PLAY
0.74
ODY
0.74
lli
0.74
Activations Density 0.008%