INDEX
    Explanations

    reported speech and statements made by individuals

    New Auto-Interp
    Negative Logits
    ãģĤãģĴ
    -0.17
    bet
    -0.16
    aga
    -0.15
    edBy
    -0.14
    kö
    -0.14
    uve
    -0.13
    Ñıв
    -0.13
    κÏħ
    -0.13
    uba
    -0.13
     somehow
    -0.13
    POSITIVE LOGITS
    èĩªå·±
    0.20
     itself
    0.20
     ìŀIJìĭł
    0.17
     themselves
    0.17
     himself
    0.16
     kendisine
    0.15
    WS
    0.15
    à¸ķà¸Ļ
    0.15
     sua
    0.14
     Ñģво
    0.14
    Act Density 0.133%

    No Known Activations