INDEX
    Explanations

    first-person pronouns and expressions of personal opinions or experiences

    New Auto-Interp
    Negative Logits
    ÙĨب
    -0.14
    azzo
    -0.14
    inson
    -0.14
    ût
    -0.14
    让æĪij
    -0.14
    ochen
    -0.14
    uter
    -0.13
    کت
    -0.13
    anne
    -0.13
    eks
    -0.13
    POSITIVE LOGITS
     wonder
    0.22
     Agree
    0.19
     agree
    0.17
     Wonder
    0.16
    ilik
    0.16
    427
    0.15
    agree
    0.15
     wondered
    0.15
     meant
    0.15
    arda
    0.14
    Act Density 0.195%

    No Known Activations