INDEX
    Explanations

    personal pronouns followed by an action or description

    first-person pronouns and assertions of personal experience

    New Auto-Interp
    Negative Logits
     unavailable
    -0.64
    cknow
    -0.63
     considerably
    -0.62
    empt
    -0.59
     independently
    -0.58
     uncertain
    -0.58
     unable
    -0.58
    knowledge
    -0.57
    viol
    -0.56
    Xi
    -0.56
    POSITIVE LOGITS
     meant
    0.90
     envisioned
    0.89
     hoped
    0.84
     preached
    0.80
     stri
    0.78
     boils
    0.78
     Wanted
    0.77
     intended
    0.77
     wanted
    0.76
     supposed
    0.72
    Act Density 0.158%

    No Known Activations