INDEX
    Explanations

    statements discussing moral or ethical judgments

    phrases indicating moral judgments and ethical considerations

    New Auto-Interp
    Negative Logits
    76561
    -0.70
    iping
    -0.68
    acan
    -0.66
    nants
    -0.66
    arta
    -0.64
    phabet
    -0.62
    Advertisements
    -0.62
    rongh
    -0.61
    jong
    -0.61
    mage
    -0.60
    POSITIVE LOGITS
     raining
    0.88
    ceivable
    0.73
     impossible
    0.71
     coincidence
    0.70
     folly
    0.69
     advisable
    0.64
    EC
    0.63
     to
    0.62
    ifiable
    0.61
     conceivable
    0.60
    Act Density 0.378%

    No Known Activations