INDEX
    Explanations

    expressions of satisfaction or approval towards outcomes and experiences

    New Auto-Interp
    Negative Logits
     elig
    -0.15
    onth
    -0.14
    ìĽĥ
    -0.14
     madrid
    -0.14
    elman
    -0.14
     è©ķ
    -0.14
    619
    -0.14
    Ģ
    -0.14
     Ukra
    -0.14
    asan
    -0.14
    POSITIVE LOGITS
     overall
    0.18
    IPA
    0.16
    彦
    0.15
     how
    0.15
    askell
    0.14
    nd
    0.14
    extent
    0.14
    izr
    0.14
    overall
    0.13
     cách
    0.13
    Act Density 0.044%

    No Known Activations