INDEX
    Explanations

    words related to positive feelings or emotions

    expressions of positive feelings or happiness

    New Auto-Interp
    Negative Logits
     distinguished
    -0.68
    ring
    -0.67
     favoured
    -0.65
    ngth
    -0.62
     Lif
    -0.62
    inguished
    -0.61
     disob
    -0.59
    eming
    -0.58
     favored
    -0.57
    rown
    -0.56
    POSITIVE LOGITS
    enough
    0.82
    stories
    0.82
    waves
    0.69
     enough
    0.67
    paren
    0.66
     Enough
    0.64
     lapt
    0.64
    Textures
    0.63
     aloud
    0.63
     alright
    0.62
    Act Density 0.065%

    No Known Activations