INDEX
    Explanations

    phrases indicating ethical judgments or considerations

    phrases indicating moral evaluations or judgments

    New Auto-Interp
    Negative Logits
    quickShipAvailable
    -0.69
    Orig
    -0.61
    ;;;;;;;;
    -0.60
    uli
    -0.60
    ById
    -0.58
     benefiting
    -0.56
    riot
    -0.56
     Oops
    -0.56
     dilig
    -0.56
     Ended
    -0.56
    POSITIVE LOGITS
     visualize
    0.84
    adies
    0.81
     practise
    0.79
     avoid
    0.79
    ads
    0.78
     automate
    0.77
    ggles
    0.76
     accomplish
    0.76
     convince
    0.75
    wered
    0.75
    Act Density 0.127%

    No Known Activations