INDEX
    Explanations

    statements emphasizing identity or self-awareness

    New Auto-Interp
    Negative Logits
    Ñĭл
    -0.14
    eniable
    -0.14
    ecided
    -0.14
    ddit
    -0.14
    anel
    -0.14
    verages
    -0.14
    تر
    -0.13
    kos
    -0.13
    hn
    -0.13
    lor
    -0.13
    POSITIVE LOGITS
     excess
    0.16
    enant
    0.15
    nature
    0.14
    اسÙĩ
    0.14
    ereum
    0.14
    kem
    0.14
     #__
    0.14
    rám
    0.14
     :";↵
    0.13
     PartialView
    0.13
    Act Density 0.072%

    No Known Activations