INDEX
    Explanations

    concepts related to morality and ethical behavior

    New Auto-Interp
    Negative Logits
    "]);
    
    -0.61
    #+#
    -0.54
    %";
    -0.53
    jectures
    -0.53
    nocache
    -0.51
    rophes
    -0.51
     })
    
    -0.51
     });
    
    -0.50
    quiera
    -0.50
    itaire
    -0.50
    POSITIVE LOGITS
     moral
    1.12
     Moral
    1.10
    moral
    1.05
     morals
    1.01
    Moral
    1.01
     morality
    1.00
     morally
    0.95
     righteousness
    0.92
     ethical
    0.89
     ethics
    0.86
    Act Density 0.553%

    No Known Activations