INDEX
    Explanations

    positive adjectives or phrases indicating benefits or advantages

    phrases indicating positive outcomes or benefits

    New Auto-Interp
    Negative Logits
    iper
    -0.73
    Downloadha
    -0.72
    eters
    -0.70
    racuse
    -0.68
    opers
    -0.67
    pter
    -0.66
    illon
    -0.65
    Bow
    -0.64
    asket
    -0.63
    hyde
    -0.63
    POSITIVE LOGITS
     enough
    1.16
    enough
    1.08
     Enough
    0.83
    bye
    0.82
     nat
    0.80
     karma
    0.77
     optics
    0.72
     additions
    0.70
     news
    0.70
     surpr
    0.70
    Act Density 0.124%

    No Known Activations