INDEX
    Explanations

    statements pertaining to morality

    New Auto-Interp
    Negative Logits
    xual
    -0.80
    gow
    -0.78
    rams
    -0.73
    rooms
    -0.71
     Lup
    -0.70
    upon
    -0.69
    abee
    -0.68
    lers
    -0.67
    -+
    -0.67
    minster
    -0.67
    POSITIVE LOGITS
    istic
    1.14
    izing
    1.06
     hazard
    1.05
     indignation
    1.03
     compass
    1.03
    ising
    1.02
     obligation
    1.01
    istically
    0.97
     dile
    0.97
     conscience
    0.96
    Act Density 0.062%

    No Known Activations