INDEX
    Explanations

    references to feelings of joy and happiness

    New Auto-Interp
    Negative Logits
    :✨
    -0.50
    <bos>
    -0.47
     authenticate
    -0.46
    InjectMocks
    -0.44
     CanadaChoose
    -0.44
    pective
    -0.43
     EClass
    -0.42
     induce
    -0.41
     acute
    -0.41
    ंदीखरीदारी
    -0.40
    POSITIVE LOGITS
     joy
    1.04
     Joy
    0.60
    joy
    0.60
     enjoyment
    0.60
     alegría
    0.59
    Joy
    0.59
     joys
    0.57
     gioia
    0.57
     alegria
    0.56
     happiness
    0.54
    Act Density 0.207%

    No Known Activations