INDEX
    Explanations

    positive emotional responses and expressions of gratitude

    New Auto-Interp
    Negative Logits
     либо
    -0.17
    unless
    -0.15
    ament
    -0.15
    acter
    -0.14
     either
    -0.14
    ultipart
    -0.14
    arna
    -0.14
     unless
    -0.14
     Nam
    -0.13
    rots
    -0.13
    POSITIVE LOGITS
     finally
    0.35
    finally
    0.32
    è¿Ļä¹Ī
    0.30
     à¤ĩतन
    0.27
     Finally
    0.27
    å¦ĤæŃ¤
    0.25
    Finally
    0.25
     such
    0.25
     ìĿ´ëłĩê²Į
    0.23
    ãģĵãĤĵãģª
    0.21
    Act Density 0.268%

    No Known Activations