INDEX
    Explanations

    words related to moral values and ethics

    references to moral concepts and ethical discussions

    New Auto-Interp
    Negative Logits
    xual
    -0.83
    rams
    -0.70
    -+
    -0.69
     Lup
    -0.69
    nces
    -0.68
     Twice
    -0.68
    upon
    -0.67
    WER
    -0.66
    gow
    -0.65
    minster
    -0.65
    POSITIVE LOGITS
    istic
    1.15
    izing
    1.12
    ising
    1.09
     hazard
    1.03
     compass
    1.03
    ised
    1.00
     indignation
    0.99
    izational
    0.97
     equival
    0.96
    IZE
    0.95
    Act Density 0.034%

    No Known Activations