INDEX
Explanations
references to Afghanistan and Afghan individuals
New Auto-Interp
Negative Logits
creen
-1.12
tower
-0.82
ometimes
-0.80
hare
-0.71
nces
-0.70
igne
-0.69
FUL
-0.69
ayers
-0.68
retty
-0.67
Harden
-0.67
POSITIVE LOGITS
istani
1.36
istan
1.33
Afghan
0.97
Taliban
0.96
Afgh
0.90
ghan
0.88
Afghans
0.85
civilians
0.80
Kurd
0.80
rug
0.79
Activations Density 0.007%