INDEX
Explanations
references to the action of digging or related concepts
New Auto-Interp
Negative Logits
ç·Ĵ
-0.18
enberg
-0.17
ulen
-0.16
adow
-0.16
deen
-0.16
arger
-0.15
uur
-0.15
彦
-0.15
adows
-0.15
ucus
-0.15
POSITIVE LOGITS
dig
0.34
dig
0.32
Dig
0.30
ression
0.29
Dig
0.28
ested
0.26
digs
0.25
ress
0.25
digging
0.24
ãĤ¿ãĥ«
0.23
Activations Density 0.013%