INDEX
    Explanations

    positive moral attributes or qualities such as nobility and goodness

    New Auto-Interp
    Negative Logits
    aq
    -0.75
    esville
    -0.74
    ingo
    -0.72
    oan
    -0.70
    ing
    -0.69
     Controlled
    -0.68
    olina
    -0.67
    aby
    -0.67
    olver
    -0.65
    yss
    -0.65
    POSITIVE LOGITS
     deeds
    1.12
     intentions
    0.97
    minded
    0.93
     laureate
    0.93
     indignation
    0.89
     gentlemen
    0.88
     virtues
    0.87
     minded
    0.87
     gentleman
    0.87
     pursuits
    0.85
    Act Density 0.196%

    No Known Activations