INDEX
    Explanations

    phrases relating to personal experience and emotional expression

    New Auto-Interp
    Negative Logits
    zew
    -0.17
    esh
    -0.17
    regnum
    -0.14
     overall
    -0.14
    alto
    -0.14
    åı¥
    -0.14
     altogether
    -0.13
    ãģ¾ãģ¾
    -0.13
     itself
    -0.13
    ito
    -0.13
    POSITIVE LOGITS
     ÙħÛĮÙĦادÛĮ
    0.19
    /on
    0.15
    eward
    0.15
    gnore
    0.15
     Bowen
    0.15
    itals
    0.14
    ürk
    0.14
    ghan
    0.14
    iaux
    0.14
    ë²Ī
    0.14
    Act Density 0.668%

    No Known Activations