INDEX
    Explanations

    references to collective actions or experiences involving "we" and "they."

    New Auto-Interp
    Negative Logits
    anson
    -0.17
    erde
    -0.15
    TEMPL
    -0.15
    oplan
    -0.15
    erap
    -0.14
    urdu
    -0.14
    erd
    -0.14
    еÑĢк
    -0.14
     Ù쨱ÙĪ
    -0.14
    zos
    -0.14
    POSITIVE LOGITS
    celik
    0.15
    025
    0.15
    cs
    0.14
    itten
    0.14
     Bez
    0.14
     creampie
    0.14
    atable
    0.13
    ulf
    0.13
    G
    0.13
    BR
    0.13
    Act Density 0.115%

    No Known Activations