INDEX
    Explanations

    sentences related to ethical values, social justice, and political commentary

    New Auto-Interp
    Negative Logits
     suspic
    -1.50
     excru
    -1.47
     reluct
    -1.43
     embra
    -1.41
     Perci
    -1.40
     inev
    -1.38
     accla
    -1.37
     impra
    -1.37
     compen
    -1.36
     increa
    -1.34
    POSITIVE LOGITS
     ones
    0.86
     which
    0.69
    ones
    0.66
     Ones
    0.63
     whose
    0.63
    which
    0.62
     including
    0.62
     where
    0.61
     sahiptir
    0.60
     olyan
    0.59
    Act Density 0.314%

    No Known Activations