INDEX
Explanations
phrases indicating actions and intentions
New Auto-Interp
Negative Logits
ilan
-0.17
иÑģÑĮ
-0.15
478
-0.14
uet
-0.14
ιÏİ
-0.14
Suk
-0.13
åħ¥ãĤĮ
-0.13
/logging
-0.13
ÏĢή
-0.13
yte
-0.13
POSITIVE LOGITS
çļĦæĺ¯
0.35
is
0.29
ìŀ¥ìĿĢ
0.20
æĺ¯ä¸Ģ个
0.18
ê²ĥìĿĢ
0.18
ë¡ľëĬĶ
0.18
å°±æĺ¯
0.18
åŃIJãģ¯
0.18
adalah
0.18
are
0.18
Activations Density 0.089%