INDEX
    Explanations

    vocabulary related to morality and morals

    New Auto-Interp
    Negative Logits
    xual
    -1.18
    rams
    -1.11
    essions
    -1.10
    lers
    -1.09
     Pavilion
    -1.06
    hips
    -1.04
    kt
    -1.00
    WER
    -1.00
    hw
    -1.00
    gow
    -0.99
    POSITIVE LOGITS
     hazard
    1.41
     compass
    1.38
    istic
    1.36
    istically
    1.29
     equival
    1.24
    ising
    1.23
     conscience
    1.21
     indignation
    1.21
     dile
    1.18
    ised
    1.17
    Act Density 1.139%

    No Known Activations