INDEX
Explanations
phrases that express personal opinions or subjective preferences
New Auto-Interp
Negative Logits
my
-0.16
ç§ģãģ®
-0.15
Remark
-0.14
Remark
-0.14
мо
-0.14
uname
-0.14
alar
-0.14
enate
-0.14
Incredible
-0.14
ãĥ³ãĥĩ
-0.13
POSITIVE LOGITS
prefer
0.19
personally
0.19
hearing
0.18
whenever
0.18
prefer
0.18
anything
0.17
gim
0.17
preference
0.17
Favorite
0.17
pref
0.16
Activations Density 0.175%