INDEX
Explanations
instances of the word "read" and related actions or prompts
New Auto-Interp
Negative Logits
gia
-0.18
odem
-0.17
anner
-0.16
dra
-0.15
gni
-0.14
cÃŃ
-0.14
ples
-0.14
chl
-0.14
edith
-0.14
cce
-0.14
POSITIVE LOGITS
ily
0.39
iness
0.38
ying
0.34
ym
0.33
just
0.32
ers
0.30
mission
0.30
ings
0.30
more
0.29
about
0.28
Activations Density 0.031%