INDEX
    Explanations

    comparative phrases that highlight preferences or contrasts

    New Auto-Interp
    Negative Logits
    ãĥ«ãĤ¯
    -0.15
    ÙħÙĦ
    -0.15
     ilma
    -0.15
    æĬŀ
    -0.15
    æijĺè¦ģ
    -0.15
    oug
    -0.14
    ross
    -0.14
    utivo
    -0.14
    coc
    -0.14
    éŁ¿
    -0.14
    POSITIVE LOGITS
     any
    0.17
    phy
    0.17
    izu
    0.16
    ecies
    0.15
    šek
    0.15
     éĹ
    0.14
    ÑĢиз
    0.14
    orean
    0.14
    ÄĻk
    0.14
     actual
    0.14
    Act Density 0.061%

    No Known Activations