INDEX
    Explanations

    references to moral judgments and ethical considerations

    New Auto-Interp
    Negative Logits
    transQ
    -0.61
    uxxxx
    -0.57
     isSet
    -0.56
    CPL
    -0.55
     GeoNames
    -0.54
    arot
    -0.53
    moveToFirst
    -0.53
    inaison
    -0.53
     ویکی‌آمباردا
    -0.52
    ANNEL
    -0.52
    POSITIVE LOGITS
     moral
    2.89
     ethical
    2.67
     Moral
    2.52
    moral
    2.48
    Moral
    2.42
     ethics
    2.32
     Ethical
    2.29
    ethical
    2.26
     morality
    2.21
     morals
    2.19
    Act Density 0.105%

    No Known Activations