INDEX
    Explanations

    phrases indicating denial or refusal

    negations or phrases indicating denial or lack of action

    New Auto-Interp
    Negative Logits
     LIFE
    -0.72
     Beaut
    -0.71
    ortality
    -0.65
     beautifully
    -0.64
    ngth
    -0.64
    ersed
    -0.64
     SPACE
    -0.64
     badass
    -0.63
     Survival
    -0.63
     Kinnikuman
    -0.62
    POSITIVE LOGITS
     condone
    1.06
     prejud
    0.90
     comment
    0.87
     regret
    0.86
     prejudice
    0.82
     speculate
    0.82
     tolerate
    0.82
     commenting
    0.79
     jeopard
    0.78
     interfere
    0.77
    Act Density 0.237%

    No Known Activations