INDEX
    Explanations

    personal pronouns and the expression of individual experiences

    New Auto-Interp
    Negative Logits
    ctions
    -0.15
    :
    -0.15
     organisation
    -0.15
    worthy
    -0.14
    af
    -0.14
    ander
    -0.14
     e
    -0.14
    ади
    -0.14
    adi
    -0.14
    itore
    -0.14
    POSITIVE LOGITS
    tas
    0.17
    ROID
    0.16
    ¥
    0.15
    tower
    0.14
     Crosby
    0.14
    âĸĪâĸĪ
    0.14
    vang
    0.14
    irsch
    0.14
    ãĤ½ãĥ³
    0.14
    ilers
    0.14
    Act Density 0.250%

    No Known Activations