INDEX
    Explanations

    phrases that express appreciation or gratitude towards others

    New Auto-Interp
    Negative Logits
    hani
    -0.15
    heet
    -0.14
    AGO
    -0.14
     Ñģебе
    -0.14
    killer
    -0.14
    .ribbon
    -0.14
    ãģĵãĤį
    -0.14
    hest
    -0.13
    .gc
    -0.13
    ắn
    -0.13
    POSITIVE LOGITS
     being
    0.33
     having
    0.29
    being
    0.23
     Being
    0.23
    Being
    0.22
     Having
    0.21
    Having
    0.19
     daring
    0.19
    having
    0.19
     not
    0.18
    Act Density 0.095%

    No Known Activations