INDEX
    Explanations

    expressions of happiness and positive feelings

    New Auto-Interp
    Negative Logits
    erner
    -0.17
    ndata
    -0.16
    leans
    -0.15
    etting
    -0.15
    ------+------+
    -0.14
    ÌĤ
    -0.14
    plevel
    -0.14
    ered
    -0.14
    ÄĽr
    -0.14
    bsite
    -0.14
    POSITIVE LOGITS
    -go
    0.20
    /light
    0.16
    yyyy
    0.16
    -looking
    0.15
    ogo
    0.15
    (er
    0.14
    fully
    0.14
    dest
    0.14
    acket
    0.14
    lic
    0.14
    Act Density 0.034%

    No Known Activations