INDEX
Explanations
instances of the word "up."
New Auto-Interp
Negative Logits
nist
-0.17
onne
-0.16
uss
-0.16
edom
-0.15
ighbor
-0.15
zas
-0.14
nal
-0.14
нам
-0.14
eb
-0.14
ersen
-0.13
POSITIVE LOGITS
oids
0.16
alim
0.15
mps
0.14
Mixin
0.14
062
0.14
with
0.14
065
0.14
TOTYPE
0.14
ç̬
0.13
idity
0.13
Activations Density 0.007%