INDEX
Explanations
references to memoirs and autobiographical content
New Auto-Interp
Negative Logits
constitu
-0.82
ulhu
-0.70
coord
-0.70
OPA
-0.68
jurisd
-0.65
unintended
-0.63
Reserved
-0.62
pots
-0.60
retard
-0.60
yg
-0.60
POSITIVE LOGITS
memoir
1.27
autobiography
1.12
oir
0.92
lishing
0.91
autobi
0.91
ogue
0.88
writer
0.87
writers
0.84
spective
0.82
essay
0.81
Activations Density 0.012%