INDEX
Explanations
instances of the word "done."
New Auto-Interp
Negative Logits
roller
-0.17
ÙĩÙħ
-0.17
craft
-0.17
mente
-0.17
ly
-0.17
ted
-0.17
ship
-0.16
doch
-0.16
rette
-0.16
side
-0.16
POSITIVE LOGITS
pez
0.25
-done
0.19
actic
0.19
zo
0.18
hower
0.17
exion
0.17
justice
0.16
erts
0.15
deal
0.15
cket
0.15
Activations Density 0.030%