INDEX
Explanations
expressions of gratitude and appreciation
New Auto-Interp
Negative Logits
here
-0.16
pride
-0.15
ushing
-0.14
via
-0.14
542
-0.14
oni
-0.14
a
-0.14
oproject
-0.14
orp
-0.14
ine
-0.14
POSITIVE LOGITS
having
0.24
being
0.21
how
0.20
hearing
0.18
having
0.18
seeing
0.18
Having
0.18
cómo
0.17
heck
0.17
Having
0.17
Activations Density 0.062%