INDEX
    Explanations

    instances of moral judgment or criticism of behavior

    New Auto-Interp
    Negative Logits
    stad
    -0.17
    iris
    -0.15
    INLINE
    -0.15
    é¦Ĩ
    -0.14
     Ore
    -0.14
    blade
    -0.14
    quine
    -0.14
    館
    -0.14
    å¸ĸ
    -0.14
    ERE
    -0.14
    POSITIVE LOGITS
    idl
    0.15
    abor
    0.15
    Ø´ÙħاÙĦÛĮ
    0.15
    imas
    0.15
    ema
    0.15
    ãĥĨãĥ«
    0.14
    ets
    0.14
     Idol
    0.14
    uib
    0.14
     Honest
    0.13
    Act Density 0.035%

    No Known Activations