INDEX
    Explanations

    complex and nuanced forms of negativity or criticism

    New Auto-Interp
    Negative Logits
    ability
    -0.21
    isation
    -0.21
    ization
    -0.20
    uation
    -0.20
    lessness
    -0.20
    eration
    -0.20
    ivism
    -0.20
    stration
    -0.19
    emption
    -0.19
    urement
    -0.19
    POSITIVE LOGITS
    eworthy
    0.30
    arious
    0.28
    acious
    0.27
    urious
    0.27
    orous
    0.26
    ughty
    0.25
    urous
    0.25
    volent
    0.25
    omorphic
    0.24
    inous
    0.24
    Act Density 0.389%

    No Known Activations