INDEX
Explanations
instances of the word "do" and its variations in different contexts
New Auto-Interp
Negative Logits
doing
-0.21
never
-0.20
do
-0.18
nt
-0.17
b
-0.17
ni
-0.17
gr
-0.17
rary
-0.17
par
-0.17
more
-0.17
POSITIVE LOGITS
zed
0.24
led
0.20
oming
0.20
able
0.20
ctype
0.20
ctr
0.19
recall
0.19
zen
0.19
(es
0.19
xor
0.18
Activations Density 0.047%