INDEX
Explanations
instances of the word "read" and related terms that indicate reading activity or comprehension
New Auto-Interp
Negative Logits
andom
-0.17
vrier
-0.17
eldorf
-0.17
kes
-0.15
chner
-0.15
owe
-0.15
partment
-0.15
ideos
-0.14
xi
-0.14
gia
-0.14
POSITIVE LOGITS
just
0.35
/view
0.30
/watch
0.30
/list
0.29
/write
0.27
aloud
0.26
ied
0.26
apt
0.24
mitted
0.24
comprehension
0.24
Activations Density 0.075%