INDEX
Explanations
instances of the word "one"
New Auto-Interp
Negative Logits
thane
-0.16
mouseup
-0.16
nds
-0.15
hee
-0.15
arken
-0.15
anca
-0.14
ouce
-0.14
mini
-0.14
tems
-0.14
ouple
-0.14
POSITIVE LOGITS
among
0.32
amongst
0.27
among
0.24
Among
0.23
Among
0.19
-third
0.19
of
0.17
-half
0.17
leg
0.16
the
0.16
Activations Density 0.045%