INDEX
    Explanations

    references to various aspects of humanity and human experience

    New Auto-Interp
    Negative Logits
    umper
    -0.19
    appen
    -0.18
     Humanity
    -0.18
     Human
    -0.18
    _human
    -0.17
    eson
    -0.17
    gers
    -0.17
     Horton
    -0.16
    gable
    -0.16
     human
    -0.15
    POSITIVE LOGITS
     beings
    0.41
    ely
    0.34
    oids
    0.33
    istic
    0.33
    itarian
    0.31
    izing
    0.26
    ly
    0.26
    -machine
    0.26
    ized
    0.26
    made
    0.26
    Act Density 0.046%

    No Known Activations