INDEX
    Explanations

    instances related to self-reflection, personal experience, and self-identity

    New Auto-Interp
    Negative Logits
     XIII
    -0.71
    heny
    -0.68
    cheon
    -0.68
    SHIP
    -0.68
    rice
    -0.68
    ī
    -0.67
    ondo
    -0.66
     Ashe
    -0.66
    endar
    -0.64
    yk
    -0.63
    POSITIVE LOGITS
    destruct
    1.08
    conscious
    0.88
    same
    0.87
    lessly
    0.85
     destruct
    0.82
    proclaimed
    0.82
     explanatory
    0.81
     esteem
    0.79
     conscious
    0.76
    pres
    0.75
    Act Density 5.191%

    No Known Activations