INDEX
Explanations
expressions of hope and optimism
New Auto-Interp
Negative Logits
him
-0.17
them
-0.16
ivi
-0.15
herself
-0.15
lui
-0.15
rowning
-0.14
acro
-0.13
zs
-0.13
好ãģį
-0.13
himself
-0.13
POSITIVE LOGITS
they
0.33
lessly
0.33
that
0.32
someday
0.32
it
0.28
we
0.28
to
0.27
this
0.25
there
0.25
others
0.24
Activations Density 0.040%