INDEX
    Explanations

    sentences discussing human qualities or experiences

    references to the concept of being human

    New Auto-Interp
    Negative Logits
    OHN
    -0.71
    arella
    -0.69
     Transcript
    -0.69
    liga
    -0.67
    forth
    -0.66
    INO
    -0.66
    rav
    -0.65
    urations
    -0.65
    armac
    -0.65
    effective
    -0.65
    POSITIVE LOGITS
     beings
    1.38
    itar
    1.20
    itarian
    1.10
    istic
    1.08
    izing
    0.96
    oids
    0.94
    ized
    0.92
    istically
    0.91
    itary
    0.90
     readable
    0.89
    Act Density 0.031%

    No Known Activations