INDEX
    Explanations

    references to the audience or reader in a persuasive context

    New Auto-Interp
    Negative Logits
     itself
    -0.27
     themselves
    -0.26
    ly
    -0.17
    mond
    -0.16
     usted
    -0.15
    Clr
    -0.15
    pei
    -0.15
    icle
    -0.14
    l
    -0.14
    onaut
    -0.14
    POSITIVE LOGITS
    /us
    0.39
    SELF
    0.33
    ’re
    0.32
     guys
    0.32
    -même
    0.31
    /her
    0.28
    're
    0.27
    nger
    0.27
    /me
    0.26
     yourself
    0.26
    Act Density 0.115%

    No Known Activations