INDEX
    Explanations

    references to oneself or reflexive actions

    references to the concept of "self."

    New Auto-Interp
    Negative Logits
    akings
    -0.76
    microsoft
    -0.66
     airspace
    -0.64
     Amend
    -0.63
     wedge
    -0.62
    MSN
    -0.58
     Learns
    -0.58
    osa
    -0.57
     arrivals
    -0.57
    rought
    -0.56
    POSITIVE LOGITS
    ortium
    1.09
    selves
    1.04
    destruct
    0.92
    self
    0.84
    ridges
    0.83
    theless
    0.83
    terday
    0.81
    same
    0.75
    ridge
    0.73
    acid
    0.73
    Act Density 0.014%

    No Known Activations