INDEX
    Explanations

    vocabulary related to morality and ethical concepts

    New Auto-Interp
    Negative Logits
    adık
    -0.11
    Æ°á»Ľi
    -0.10
    ActionCreators
    -0.10
    anzeigen
    -0.10
    ureau
    -0.10
    aciente
    -0.10
    Backdrop
    -0.09
    uegos
    -0.09
    ataire
    -0.09
    HeaderInSection
    -0.09
    POSITIVE LOGITS
     happiness
    0.37
     sincerity
    0.37
     honesty
    0.35
     optimism
    0.35
     greatness
    0.35
     generosity
    0.34
     integrity
    0.34
     dignity
    0.34
     humility
    0.34
     excellence
    0.34
    Act Density 12.308%

    No Known Activations