INDEX
    Explanations

    statements of personal experience and introspection

    New Auto-Interp
    Negative Logits
     Theſe
    -1.01
     שוליים
    -1.00
     חיצוניים
    -0.97
    ništvo
    -0.96
     виправивши
    -0.94
     يتيمه
    -0.90
     themſelves
    -0.90
    ftagPool
    -0.88
    ſelf
    -0.88
    principalColumn
    -0.88
    POSITIVE LOGITS
     my
    1.01
     I
    1.01
     myself
    0.79
    my
    0.72
    私は
    0.60
    私の
    0.60
     me
    0.60
    I
    0.58
     My
    0.57
    ฉัน
    0.56
    Act Density 0.844%

    No Known Activations