INDEX
Explanations
mentions of the author Christopher Hitchens
New Auto-Interp
Negative Logits
pires
-0.80
otype
-0.70
inent
-0.70
agin
-0.66
æ©Ł
-0.65
orld
-0.63
UTH
-0.61
Philos
-0.59
æĥ
-0.59
peacefully
-0.58
POSITIVE LOGITS
ched
1.29
ches
1.07
boxes
1.02
box
0.92
tle
0.88
chens
0.87
achi
0.83
ting
0.82
ted
0.81
gerald
0.79
Activations Density 0.641%