INDEX
    Explanations

    phrases related to self-related concepts or actions

    references to self-identity or self-related concepts

    New Auto-Interp
    Negative Logits
    IUM
    -0.75
    ICAN
    -0.71
    nis
    -0.69
    ONT
    -0.67
    pheus
    -0.62
    ondo
    -0.62
     Nights
    -0.61
    dayName
    -0.61
    andum
    -0.60
    oS
    -0.60
    POSITIVE LOGITS
    lessly
    0.97
    same
    0.95
     esteem
    0.94
    destruct
    0.93
     destruct
    0.93
    -
    0.93
    ridges
    0.90
     explanatory
    0.88
     proclaimed
    0.86
    less
    0.83
    Act Density 0.032%

    No Known Activations