INDEX
    Explanations

    self-referential language, such as words that indicate self-identity or self-perception

    references to self-identification and personal perspective

    New Auto-Interp
    Negative Logits
     Meth
    -0.70
    --+
    -0.67
     airst
    -0.62
     Lear
    -0.61
     Via
    -0.61
     methamphetamine
    -0.60
    ARB
    -0.60
     Madden
    -0.60
     frogs
    -0.59
    ////
    -0.59
    POSITIVE LOGITS
    limits
    0.76
    zbek
    0.75
    ãĥ¤
    0.72
    tical
    0.69
    animous
    0.69
    priv
    0.68
    agi
    0.67
     favorably
    0.66
    ilitarian
    0.66
    polit
    0.66
    Act Density 0.135%

    No Known Activations