INDEX
    Explanations

    expressions and mentions of happiness

    New Auto-Interp
    Negative Logits
     Roof
    -0.39
     disponibilités
    -0.36
     Độ
    -0.36
    Становништво
    -0.35
    NameInMap
    -0.35
    strokeStyle
    -0.35
    zieher
    -0.35
    wyżs
    -0.34
    -------
    -0.33
    <bos>
    -0.33
    POSITIVE LOGITS
    happy
    0.79
    Happy
    0.74
    HAPPY
    0.71
     HAPPY
    0.71
     Happy
    0.68
     happy
    0.63
    happ
    0.60
     felices
    0.57
    makeText
    0.57
     للمعارف
    0.57
    Act Density 0.009%

    No Known Activations