INDEX
    Explanations

    references to moral concepts and ethical considerations

    New Auto-Interp
    Negative Logits
    xual
    -0.79
    gow
    -0.74
    rooms
    -0.73
    rams
    -0.72
    lers
    -0.70
    upon
    -0.70
    minster
    -0.69
     Lup
    -0.68
    abee
    -0.67
    hips
    -0.67
    POSITIVE LOGITS
    istic
    1.15
    izing
    1.10
     hazard
    1.06
    ising
    1.05
     compass
    1.04
     indignation
    1.01
     obligation
    0.98
    istically
    0.97
    ised
    0.96
     dile
    0.96
    Act Density 0.032%

    No Known Activations