INDEX
Explanations
words related to psychological attachments and detachment
New Auto-Interp
Negative Logits
teasp
-0.82
inois
-0.69
widget
-0.66
sky
-0.64
omsky
-0.64
©¶æ¥µ
-0.63
NS
-0.63
MX
-0.63
nder
-0.62
ãĤ¢ãĥ«
-0.62
POSITIVE LOGITS
them
0.96
something
0.94
anything
0.91
these
0.90
whichever
0.89
what
0.87
whatever
0.82
this
0.82
the
0.81
those
0.81
Activations Density 2.944%