INDEX
    Explanations

    phrases related to positive feedback or appreciation

    positive expressions of opinions or preferences

    New Auto-Interp
    Negative Logits
    iliated
    -0.66
    å°Ĩ
    -0.65
     probable
    -0.63
    Registered
    -0.60
    interrupted
    -0.59
    iped
    -0.58
    WARE
    -0.58
     lethal
    -0.57
    bia
    -0.57
    udder
    -0.56
    POSITIVE LOGITS
     because
    0.94
     tho
    0.91
     bec
    0.77
    !!!!
    0.75
     though
    0.73
    cause
    0.72
     uncond
    0.71
    !!!!!!!!
    0.70
    !!!
    0.70
     :-)
    0.68
    Act Density 0.613%

    No Known Activations