INDEX
Explanations
references to the timing and context of events
New Auto-Interp
Negative Logits
Reform
-0.15
credited
-0.14
itten
-0.13
Reed
-0.13
enth
-0.13
öh
-0.13
readcr
-0.13
du
-0.13
Bas
-0.13
alter
-0.12
POSITIVE LOGITS
writing
0.40
writing
0.31
Writing
0.28
Writing
0.27
publishing
0.27
-writing
0.26
publication
0.26
write
0.24
press
0.24
typing
0.24
Activations Density 0.036%