INDEX
    Explanations

    words related to moral or personal virtues

    terms related to morality and virtue

    New Auto-Interp
    Negative Logits
    ockets
    -0.82
    oval
    -0.71
    bered
    -0.67
    grown
    -0.67
    ZI
    -0.65
    alone
    -0.61
    gren
    -0.61
     fragmented
    -0.60
    cedented
    -0.60
    KS
    -0.60
    POSITIVE LOGITS
     virtue
    1.02
     signalling
    0.93
     Virtue
    0.88
     dilig
    0.83
     signaling
    0.80
    iosity
    0.78
     Deity
    0.76
    ienne
    0.76
     distingu
    0.75
     resil
    0.73
    Act Density 0.007%

    No Known Activations