INDEX
    Explanations

    expressions of identity and self-awareness

    New Auto-Interp
    Negative Logits
     my
    -0.17
     моÑĹ
    -0.16
     meinem
    -0.16
    orz
    -0.15
     geschichten
    -0.15
     Heller
    -0.15
     meinen
    -0.15
     mijn
    -0.15
    ivant
    -0.15
    seau
    -0.15
    POSITIVE LOGITS
     I
    0.30
     ÎĻ
    0.22
    I
    0.21
     ÐĨ
    0.20
    "I
    0.20
     İ
    0.19
    'I
    0.19
    _I
    0.19
    “I
    0.18
     Ðĺ
    0.18
    Act Density 0.070%

    No Known Activations