INDEX
    Explanations

    expressions of personal beliefs and emotions

    New Auto-Interp
    Negative Logits
    emoc
    -0.16
    adera
    -0.15
    udu
    -0.15
    رÛĮز
    -0.15
    noop
    -0.15
    rys
    -0.15
    .slim
    -0.14
    VISION
    -0.14
    .nlm
    -0.14
    ύ
    -0.14
    POSITIVE LOGITS
     him
    0.30
     ihn
    0.26
     lui
    0.22
     Him
    0.22
     그를
    0.21
     onun
    0.20
     ihm
    0.20
     HIM
    0.18
     onu
    0.17
     عÙĨÙĩ
    0.17
    Act Density 0.376%

    No Known Activations