INDEX
    Explanations

    adjectives describing low or negative qualities

    negative descriptors related to quality or performance

    New Auto-Interp
    Negative Logits
    leans
    -0.84
    ju
    -0.73
    plane
    -0.72
    frey
    -0.72
    llers
    -0.70
    cise
    -0.70
    cript
    -0.70
    planes
    -0.68
    alde
    -0.68
    lean
    -0.67
    POSITIVE LOGITS
    glers
    0.88
     miser
    0.88
    luster
    0.80
     Downs
    0.75
     rollout
    0.68
     incompet
    0.68
     excuses
    0.68
    nesses
    0.68
     incompetence
    0.68
     Spac
    0.66
    Act Density 0.085%

    No Known Activations