INDEX
    Explanations

    personal pronouns and words related to personal experiences and beliefs

    references to personal experiences and feelings related to everyday life

    New Auto-Interp
    Negative Logits
    lyak
    -0.77
    gart
    -0.68
    odan
    -0.61
     reciproc
    -0.59
     unsurprisingly
    -0.56
    RAFT
    -0.55
     majorities
    -0.55
     winner
    -0.55
    YN
    -0.55
    proxy
    -0.54
    POSITIVE LOGITS
    pires
    0.95
     imaginable
    0.95
    except
    0.93
     except
    0.88
    ãĤ¨ãĥ«
    0.86
    abilia
    0.82
     EVER
    0.77
     ever
    0.75
    ãĤ´
    0.72
    including
    0.72
    Act Density 0.166%

    No Known Activations