INDEX
    Explanations

    themes related to charity and moral principles

    New Auto-Interp
    Negative Logits
    èĩ£
    -0.19
    azon
    -0.17
    ogo
    -0.17
     ÄIJiá»ĩn
    -0.16
    олов
    -0.15
    rama
    -0.14
    çŃĨ
    -0.14
    .usermodel
    -0.14
    ženÃŃ
    -0.14
     Dro
    -0.14
    POSITIVE LOGITS
     Cheer
    0.17
     pom
    0.16
     ed
    0.16
    Tim
    0.16
    å°
    0.15
     Tim
    0.14
     orderly
    0.14
     Тим
    0.14
    513
    0.14
    oron
    0.13
    Act Density 0.059%

    No Known Activations