INDEX
    Explanations

    references to self-awareness and identity

    New Auto-Interp
    Negative Logits
    eyh
    -0.18
     yapılır
    -0.15
     bulunur
    -0.15
     yapar
    -0.15
    icari
    -0.14
    _ping
    -0.13
     dont
    -0.13
    :size
    -0.13
     Bec
    -0.13
     kullanılır
    -0.13
    POSITIVE LOGITS
     Äijang
    0.48
    æŃ£åľ¨
    0.42
     is
    0.40
     are
    0.38
    à¸ģำล
    0.35
     estamos
    0.31
     está
    0.30
    æĺ¯åľ¨
    0.29
    æĺ¯
    0.28
     æĺ¯
    0.28
    Act Density 0.678%

    No Known Activations