INDEX
    Explanations

    references to moral values or moral concepts

    New Auto-Interp
    Negative Logits
     كومونز
    -0.84
    -0.74
    .";
    
    -0.68
    _));
    -0.64
     MacKenzie
    -0.63
     spillage
    -0.63
    Ой
    -0.61
     ddelweddau
    -0.60
     Chavez
    -0.60
    proken
    -0.59
    POSITIVE LOGITS
     Moral
    0.83
     moral
    0.76
    PerformLayout
    0.76
    ulemon
    0.75
    moral
    0.73
    PreferredItem
    0.73
    оле
    0.73
    Mor
    0.72
    Morrison
    0.71
     Morality
    0.69
    Act Density 0.002%

    No Known Activations