INDEX
Explanations
phrases prompting the reader to access further information or read additional content
instances of the word "Read" or related calls to action encouraging further reading
New Auto-Interp
Negative Logits
anners
-0.61
anim
-0.59
otions
-0.59
idia
-0.59
acity
-0.59
lying
-0.59
ackle
-0.58
ountain
-0.58
ukong
-0.56
fo
-0.55
POSITIVE LOGITS
Read
3.89
Read
2.17
READ
1.92
read
1.86
read
1.85
Write
1.65
READ
1.56
Reading
1.51
reads
1.49
reads
1.48
Activations Density 0.015%