INDEX
Explanations
expressions of desire or intent related to personal goals and aspirations
New Auto-Interp
Negative Logits
erse
-0.18
Yourself
-0.17
erti
-0.16
ureau
-0.15
udas
-0.15
hu
-0.15
akers
-0.15
weg
-0.15
kea
-0.15
yourselves
-0.15
POSITIVE LOGITS
/ne
0.23
nothing
0.23
want
0.20
only
0.20
/
0.20
them
0.19
desperately
0.18
to
0.18
something
0.18
rid
0.17
Activations Density 0.067%