INDEX
    Explanations

    instances where something can be improved or done better

    phrases indicating improvement or positive performance

    New Auto-Interp
    Negative Logits
     Tru
    -0.75
    jected
    -0.75
     Personality
    -0.69
     Mastery
    -0.69
    went
    -0.67
    ipel
    -0.66
    ixed
    -0.64
     Cutter
    -0.63
    shaw
    -0.61
    ãĤ¦ãĤ¹
    -0.61
    POSITIVE LOGITS
     grunt
    0.78
     job
    0.75
     injustice
    0.74
    ingen
    0.71
    benefit
    0.68
     offline
    0.68
     homework
    0.67
     deed
    0.67
    eret
    0.64
     deserve
    0.64
    Act Density 0.077%

    No Known Activations