INDEX
Explanations
phrases indicating examples or comparisons
New Auto-Interp
Negative Logits
á»IJ
-0.17
chet
-0.15
/goto
-0.15
даÑı
-0.14
ÙģÙĩÙĪÙħ
-0.14
Affero
-0.14
åĩºåĵģ
-0.14
thers
-0.14
Hüs
-0.13
kus
-0.13
POSITIVE LOGITS
following
0.29
seguint
0.26
:↵
0.24
以ä¸ĭ
0.23
following
0.22
å¦Ĥä¸ĭ
0.21
Following
0.21
ÑģледÑĥÑİÑī
0.21
Following
0.20
ëĭ¤ìĿĮê³¼
0.20
Activations Density 0.084%