INDEX
Explanations
casual conversational phrases and expressions of personal opinion
New Auto-Interp
Negative Logits
ÑģÑĤи
-0.15
azal
-0.14
DMIN
-0.14
(æĹ¥
-0.14
æ¡
-0.13
nty
-0.13
à¹ĥ
-0.13
Disappear
-0.13
.Promise
-0.13
اÙĤع
-0.13
POSITIVE LOGITS
opus
0.16
Å¥
0.16
fed
0.15
apus
0.14
eil
0.14
fds
0.14
illard
0.14
690
0.14
ruk
0.14
ayo
0.13
Activations Density 0.523%