INDEX
Explanations
references to reading or written text
instances of the word "read" and its variations
New Auto-Interp
Negative Logits
xon
-0.70
ño
-0.69
pload
-0.64
Credit
-0.64
afort
-0.63
ctions
-0.63
ella
-0.62
henko
-0.59
Enlarge
-0.58
ction
-0.58
POSITIVE LOGITS
aloud
1.34
just
1.16
comprehension
1.11
mitt
0.91
dress
0.87
ied
0.87
excerpts
0.86
books
0.84
write
0.84
itatively
0.82
Activations Density 0.031%