INDEX
    Explanations

    statements expressing differing levels of correctness or morality, along with recommendations or judgments about actions

    phrases that express mistakes or wrongdoings related to moral or ethical judgments

    New Auto-Interp
    Negative Logits
     strengths
    -0.70
    lator
    -0.68
     reperto
    -0.67
    aukee
    -0.63
     calmed
    -0.63
     linem
    -0.63
     rapport
    -0.62
    hani
    -0.62
    uli
    -0.61
     delightful
    -0.61
    POSITIVE LOGITS
     underestimate
    0.88
     knowingly
    0.83
     anymore
    0.81
     presume
    0.80
     oppose
    0.79
     condone
    0.78
     whatsoever
    0.77
     anyone
    0.75
     accuse
    0.75
     impose
    0.75
    Act Density 0.207%

    No Known Activations