INDEX
    Explanations

    instances of the word "smile" and other expressions of happiness or friendliness

    New Auto-Interp
    Negative Logits
    jsdelivr
    -0.61
     ab
    -0.60
     near
    -0.58
    madu
    -0.56
     pod
    -0.55
     N
    -0.55
     Ed
    -0.55
    -0.54
     In
    -0.54
     Pod
    -0.54
    POSITIVE LOGITS
     smile
    3.11
     smiles
    2.72
     Smile
    2.64
    Smile
    2.46
    smile
    2.39
     smiling
    2.33
     smiled
    2.26
     Smiles
    2.18
     Smiling
    2.06
    smiles
    1.97
    Act Density 0.049%

    No Known Activations