INDEX
    Explanations

    instances of "I" to identify self-referential expressions

    New Auto-Interp
    Negative Logits
    olik
    -0.17
    .Formatting
    -0.17
    zyst
    -0.16
    rž
    -0.14
    аниÑĨ
    -0.14
    é«ĺæ¸ħ
    -0.14
    ίνη
    -0.14
    åŁ
    -0.14
    iteral
    -0.14
    ych
    -0.13
    POSITIVE LOGITS
    nn
    0.15
     rip
    0.15
     loved
    0.15
    ı
    0.14
    entes
    0.14
    _dl
    0.14
    entin
    0.14
     Daly
    0.14
     Entity
    0.14
     Fauc
    0.14
    Act Density 0.252%

    No Known Activations