INDEX
    Explanations

    phrases indicating actions and intentions

    New Auto-Interp
    Negative Logits
    ilan
    -0.17
    иÑģÑĮ
    -0.15
    478
    -0.14
    uet
    -0.14
    ιÏİ
    -0.14
     Suk
    -0.13
    åħ¥ãĤĮ
    -0.13
    /logging
    -0.13
    ÏĢή
    -0.13
    yte
    -0.13
    POSITIVE LOGITS
    çļĦæĺ¯
    0.35
     is
    0.29
    ìŀ¥ìĿĢ
    0.20
    æĺ¯ä¸Ģ个
    0.18
     ê²ĥìĿĢ
    0.18
    ë¡ľëĬĶ
    0.18
    å°±æĺ¯
    0.18
    åŃIJãģ¯
    0.18
     adalah
    0.18
     are
    0.18
    Act Density 0.089%

    No Known Activations