INDEX
Explanations
phrases indicating the inability to resist or stop doing something
New Auto-Interp
Negative Logits
วà¸ĩ
-0.16
izzle
-0.15
olas
-0.15
oya
-0.15
á»Ĩ
-0.15
atch
-0.14
úng
-0.13
æħĮ
-0.13
amento
-0.13
qing
-0.13
POSITIVE LOGITS
notice
0.22
being
0.21
but
0.20
wonder
0.19
noticed
0.19
notices
0.18
Notices
0.18
noticing
0.18
feeling
0.17
lessly
0.17
Activations Density 0.011%