INDEX
Explanations
references to time durations and reading lengths
New Auto-Interp
Negative Logits
meric
-0.16
felt
-0.15
ALS
-0.14
kara
-0.14
iphy
-0.14
hazi
-0.14
oultry
-0.14
opo
-0.14
ston
-0.13
216
-0.13
POSITIVE LOGITS
read
0.25
ago
0.24
读
0.20
reading
0.20
ago
0.20
reads
0.19
-read
0.19
reads
0.17
걸
0.16
riott
0.16
Activations Density 0.015%