INDEX
Explanations
references to the pronoun "you."
New Auto-Interp
Negative Logits
tings
-0.15
jist
-0.14
rais
-0.14
ê±°ëŀĺ
-0.14
nob
-0.14
probably
-0.14
inds
-0.13
utron
-0.13
ut
-0.13
iable
-0.13
POSITIVE LOGITS
ever
0.24
haven
0.24
hasn
0.21
Haven
0.19
hadn
0.19
somehow
0.17
haven
0.17
essel
0.17
squ
0.16
EVER
0.16
Activations Density 0.046%