INDEX
Explanations
instances of the word "press" in various contexts
New Auto-Interp
Negative Logits
juan
-0.18
chy
-0.18
sá»ķ
-0.16
tsy
-0.16
lify
-0.16
Verde
-0.15
qing
-0.15
sole
-0.15
ypse
-0.15
oby
-0.15
POSITIVE LOGITS
uring
0.32
ur
0.30
ures
0.29
ured
0.28
sure
0.27
oir
0.25
umably
0.25
sing
0.25
sed
0.25
gang
0.24
Activations Density 0.025%