INDEX
    Explanations

    instances of happiness or positive emotional expressions

    New Auto-Interp
    Negative Logits
    SAN
    -0.15
    ching
    -0.15
    å½
    -0.14
    layer
    -0.14
    hea
    -0.14
    age
    -0.14
     dra
    -0.13
     prop
    -0.13
    949
    -0.13
    aż
    -0.13
    POSITIVE LOGITS
     Dip
    0.17
    तम
    0.16
    ione
    0.15
    itech
    0.15
    frica
    0.15
    ιά
    0.15
    Ñĩина
    0.15
    isd
    0.14
    ¶Į
    0.14
    apt
    0.14
    Act Density 0.028%

    No Known Activations