INDEX
    Explanations

    negative emotional descriptors and expressions of suffering

    New Auto-Interp
    Negative Logits
    ÌĢ
    -0.16
     tou
    -0.16
    о
    -0.15
    ÙIJÙĥ
    -0.15
    iesta
    -0.14
    а
    -0.14
     hete
    -0.14
     æ
    -0.13
     quali
    -0.13
     tic
    -0.13
    POSITIVE LOGITS
     ÙĪ
    0.24
    ØĮ
    0.22
     Ú©
    0.21
     ب
    0.21
     بر
    0.20
    ÙIJ
    0.20
     با
    0.20
     س
    0.20
     ÙħÙĨ
    0.20
    âĢĮ
    0.20
    Act Density 0.006%

    No Known Activations