INDEX
Explanations
mentions of the name "Bush" at a particularly high activation level
references to former President Bush and his administration's actions
New Auto-Interp
Negative Logits
semble
-0.72
Qiao
-0.67
Cth
-0.65
Else
-0.64
Norn
-0.62
Harmony
-0.62
ymph
-0.62
clipboard
-0.60
feature
-0.60
Pixie
-0.60
POSITIVE LOGITS
nell
1.21
ido
1.05
master
0.93
lett
0.92
minster
0.80
Bush
0.80
Hussein
0.78
men
0.78
enegger
0.76
bour
0.76
Activations Density 0.014%