INDEX
Explanations
personal pronouns specifically related to addressing the reader
New Auto-Interp
Negative Logits
andon
-0.15
osite
-0.14
Fcn
-0.14
atto
-0.14
dde
-0.14
Bilim
-0.14
bjerg
-0.14
ilestone
-0.14
assorted
-0.14
lé
-0.13
POSITIVE LOGITS
Already
0.19
suspect
0.19
already
0.18
457
0.18
Already
0.17
lucky
0.17
696
0.16
maal
0.16
plan
0.15
absolutely
0.15
Activations Density 0.090%