INDEX
    Explanations

    phrases that indicate expressions of agreement or acknowledgment

    New Auto-Interp
    Negative Logits
    yš
    -0.15
    ãİ
    -0.15
    Ware
    -0.14
    ardin
    -0.14
    ãĤ¤ãĤ¯
    -0.14
    عÛĮ
    -0.14
    è¼Ŀ
    -0.14
    سÙĪÙĨ
    -0.14
    еÑī
    -0.13
     Ñģло
    -0.13
    POSITIVE LOGITS
    uv
    0.17
    st
    0.17
    937
    0.16
    ett
    0.15
     interview
    0.15
     rou
    0.14
    澤
    0.14
    oder
    0.14
    lig
    0.14
     fellow
    0.14
    Act Density 0.489%

    No Known Activations