INDEX
Explanations
proper nouns related to political figures and locations
references to Afghanistan and Uzbekistan
New Auto-Interp
Negative Logits
cribed
-0.74
chool
-0.68
*/(
-0.67
rent
-0.66
Pixie
-0.65
ensor
-0.63
Kafka
-0.62
Merrill
-0.62
creen
-0.61
ection
-0.61
POSITIVE LOGITS
istan
1.62
ghan
1.23
istani
1.11
igan
1.05
igans
0.91
awan
0.89
isma
0.85
thur
0.85
wen
0.84
amara
0.84
Activations Density 0.004%