INDEX
Explanations
mentions of asylum seekers
references to asylum seekers
New Auto-Interp
Negative Logits
oried
-0.74
ories
-0.68
ORED
-0.67
oker
-0.67
orie
-0.66
Shore
-0.66
uncture
-0.66
otion
-0.64
nect
-0.63
Magn
-0.63
POSITIVE LOGITS
anamo
0.99
seekers
0.87
'
0.86
atis
0.80
detainees
0.77
']
0.77
tics
0.73
detained
0.73
fleeing
0.72
wana
0.70
Activations Density 0.066%