INDEX
    Explanations

    references to personal experiences and self-referential statements

    New Auto-Interp
    Negative Logits
    ness
    -0.23
     themselves
    -0.23
     itself
    -0.20
    nya
    -0.20
    ly
    -0.19
    ship
    -0.18
    naire
    -0.18
    Ùĩا
    -0.17
    weise
    -0.17
    wise
    -0.17
    POSITIVE LOGITS
    /us
    0.58
    /her
    0.34
    /my
    0.29
    adows
    0.29
    zzo
    0.28
     personally
    0.28
    SELF
    0.28
    adow
    0.28
    andering
    0.25
    -même
    0.25
    Act Density 0.117%

    No Known Activations