INDEX
    Explanations

    personal pronouns in the text

    New Auto-Interp
    Negative Logits
    lete
    -0.07
    èģĺ
    -0.07
    {}_
    -0.06
    ovit
    -0.06
    اÙĤÙĦ
    -0.06
    alsex
    -0.06
    lite
    -0.06
    .utilities
    -0.06
    LOUD
    -0.06
    irts
    -0.06
    POSITIVE LOGITS
    857
    0.07
     opp
    0.07
    eso
    0.07
    rganization
    0.07
     pert
    0.06
     neutr
    0.06
    cert
    0.06
    iver
    0.06
    uator
    0.06
    ough
    0.06
    Act Density 0.036%

    No Known Activations