INDEX
Explanations
titles of books
titles of books or works mentioned in the text
New Auto-Interp
Negative Logits
yright
-0.70
hee
-0.70
ogle
-0.69
idine
-0.69
umably
-0.65
undai
-0.64
rost
-0.64
bered
-0.63
outube
-0.62
respective
-0.62
POSITIVE LOGITS
Lessons
1.25
Causes
1.16
Bringing
1.14
How
1.13
Principles
1.09
Secrets
1.09
Strategies
1.08
Stories
1.08
Why
1.07
Making
1.05
Activations Density 0.059%