INDEX
    Explanations

    mentions or references to the word "human."

    references to human characteristics and experiences

    New Auto-Interp
    Negative Logits
    arella
    -0.82
    è¦
    -0.81
    armac
    -0.76
    OHN
    -0.68
    urations
    -0.68
    liga
    -0.68
    é¾įå
    -0.67
    abb
    -0.67
    effective
    -0.66
    kick
    -0.65
    POSITIVE LOGITS
     beings
    1.34
    itar
    1.08
    oids
    1.07
    itarian
    1.02
     readable
    0.99
    istic
    0.96
     embryonic
    0.93
    oid
    0.92
    ized
    0.90
    izing
    0.88
    Act Density 0.026%

    No Known Activations