INDEX
    Explanations

    phrases related to the concept of self

    words and phrases indicating self-reference or self-descriptions

    New Auto-Interp
    Negative Logits
    IUM
    -0.78
    ICAN
    -0.78
     Ashe
    -0.77
     XIII
    -0.70
    ONT
    -0.69
    etter
    -0.68
    rium
    -0.68
    oice
    -0.65
    oric
    -0.65
    IENCE
    -0.64
    POSITIVE LOGITS
    destruct
    1.11
    lessly
    1.05
    -
    1.01
    same
    1.01
     destruct
    0.93
     explanatory
    0.93
    proclaimed
    0.92
    ridges
    0.89
    âĢij
    0.88
    less
    0.86
    Act Density 0.016%

    No Known Activations