INDEX
    Explanations

    positive descriptive words and expressions

    adjectives and descriptive phrases expressing positivity or critique

    New Auto-Interp
    Negative Logits
    ãĥĺãĥ©
    -0.70
    anges
    -0.70
    adj
    -0.69
    opy
    -0.64
    _-
    -0.63
    ould
    -0.62
    =-=-=-=-=-=-=-=-
    -0.61
    OULD
    -0.61
    Domain
    -0.61
    annot
    -0.60
    POSITIVE LOGITS
     lately
    1.09
     fruitful
    1.00
     since
    0.99
     unsuccessful
    0.81
     successful
    0.79
     awhile
    0.78
     productive
    0.75
     steady
    0.74
     steadily
    0.72
    since
    0.72
    Act Density 0.310%

    No Known Activations