INDEX
Explanations
references to reading and the experience of engaging with written content
New Auto-Interp
Negative Logits
ì¶ľ
-0.16
Wor
-0.15
iske
-0.15
resher
-0.14
quin
-0.14
ubes
-0.14
nesc
-0.13
Template
-0.13
newline
-0.13
qv
-0.13
POSITIVE LOGITS
reading
0.41
read
0.37
reads
0.33
reading
0.32
éĺħ读
0.32
Reading
0.31
Reading
0.30
读
0.30
read
0.29
-read
0.28
Activations Density 0.238%