INDEX
Explanations
expressions of perception or opinion
New Auto-Interp
Negative Logits
themselves
-0.16
nev
-0.14
uest
-0.14
åħ¥ãĤĮ
-0.14
himself
-0.14
oes
-0.13
herself
-0.13
ÏĦοÏħÏĤ
-0.13
hte
-0.13
itself
-0.13
POSITIVE LOGITS
like
0.26
Like
0.22
clear
0.21
Like
0.20
like
0.19
_like
0.18
.like
0.18
như
0.18
likes
0.18
LIKE
0.18
Activations Density 0.035%