INDEX
    Explanations

    phrases that express appreciation and motivation

    New Auto-Interp
    Negative Logits
     my
    -0.27
     myself
    -0.27
    让æĪij
    -0.25
    ç»ĻæĪij
    -0.24
     mine
    -0.23
     мне
    -0.22
     tôi
    -0.22
    æĪijçļĦ
    -0.22
     mijn
    -0.21
     meu
    -0.21
    POSITIVE LOGITS
     we
    0.44
     our
    0.42
     ourselves
    0.40
    æĪij们
    0.36
    æĪij们çļĦ
    0.34
    æĪijåĢij
    0.33
    Our
    0.33
     ours
    0.32
    our
    0.32
    ï¼ĮæĪij们
    0.32
    Act Density 0.036%

    No Known Activations