INDEX
    Explanations

    keywords related to issues of morality, self-interest, and personal attributes or actions

    terms related to societal behavior and interpersonal relationships

    New Auto-Interp
    Negative Logits
    ]=
    -0.66
    }}}
    -0.66
     }}
    -0.63
    ãĤ´ãĥ³
    -0.61
     Cheong
    -0.59
     TOD
    -0.57
     )]
    -0.56
    atorium
    -0.56
    writ
    -0.55
     fixme
    -0.55
    POSITIVE LOGITS
     that
    1.31
    that
    1.07
     THAT
    1.04
     who
    0.94
     which
    0.87
     whom
    0.86
    That
    0.78
    who
    0.78
     That
    0.77
     whose
    0.76
    Act Density 0.282%

    No Known Activations