INDEX
    Explanations

    verbs related to decline or deterioration

    New Auto-Interp
    Negative Logits
     truthful
    -0.62
    ortment
    -0.61
     sarc
    -0.61
     yourself
    -0.60
     accountable
    -0.60
     naming
    -0.59
     identification
    -0.59
     congrat
    -0.58
     objective
    -0.58
     examples
    -0.58
    POSITIVE LOGITS
    uates
    1.06
    uated
    1.02
    ighed
    1.01
    uating
    0.98
    uate
    0.96
    iated
    0.90
    ues
    0.90
    ceed
    0.89
    elled
    0.88
    pped
    0.85
    Act Density 0.047%

    No Known Activations