INDEX
    Explanations

    mentions of conscience or moral principles

    references to moral awareness or ethical considerations

    New Auto-Interp
    Negative Logits
    eri
    -0.84
    olds
    -0.70
    ORPG
    -0.68
    ramid
    -0.67
    enty
    -0.67
    ishers
    -0.65
     Extended
    -0.65
    vati
    -0.65
    atern
    -0.65
    anas
    -0.65
    POSITIVE LOGITS
     conscience
    0.92
    fulness
    0.87
    ful
    0.69
    OME
    0.67
     disposition
    0.66
     compass
    0.66
    less
    0.65
    ngth
    0.65
    FUL
    0.65
     disobedience
    0.64
    Act Density 0.018%

    No Known Activations