INDEX
    Explanations

    personal pronouns and verbs related to actions or intentions expressed in the present tense

    expressions related to personal experience or perspectives

    New Auto-Interp
    Negative Logits
    similar
    -0.71
    sequ
    -0.66
     similarly
    -0.64
    juries
    -0.64
    ventions
    -0.63
    uthor
    -0.61
    ournal
    -0.59
    rities
    -0.58
     harms
    -0.58
    pox
    -0.57
    POSITIVE LOGITS
     Nare
    0.66
     behav
    0.65
     boils
    0.64
    Alias
    0.64
     endings
    0.64
    _-_
    0.63
     nutshell
    0.62
    !.
    0.62
    liest
    0.62
    AME
    0.62
    Act Density 0.227%

    No Known Activations