INDEX
    Explanations

    phrases related to ethical behavior and societal issues

    New Auto-Interp
    Negative Logits
     severally
    -0.55
     Manufact
    -0.54
     pooh
    -0.52
     Righ
    -0.52
     accla
    -0.50
     mew
    -0.50
     snoopy
    -0.50
     philanth
    -0.49
     swee
    -0.48
     Ename
    -0.48
    POSITIVE LOGITS
     whatsoever
    0.98
     nor
    0.67
     except
    0.60
    Saluti
    0.56
     anymore
    0.55
    except
    0.53
     estekak
    0.51
    idać
    0.50
    المشاركات
    0.49
    <bos>
    0.47
    Act Density 0.326%

    No Known Activations