INDEX
    Explanations

    phrases and words related to moral judgment and ethical considerations

    New Auto-Interp
    Negative Logits
    Ñħи
    -0.18
     od
    -0.17
     elegance
    -0.15
     Od
    -0.15
    itaire
    -0.14
    ourt
    -0.14
    .omg
    -0.14
    core
    -0.13
    redi
    -0.13
    rig
    -0.13
    POSITIVE LOGITS
     because
    0.57
    because
    0.52
     porque
    0.48
    Because
    0.48
     Because
    0.46
     поÑĤомÑĥ
    0.40
    åĽłä¸º
    0.39
     karena
    0.39
     omdat
    0.38
     perché
    0.37
    Act Density 0.217%

    No Known Activations