INDEX
    Explanations

    references to the pronoun "you."

    New Auto-Interp
    Negative Logits
    tings
    -0.15
    jist
    -0.14
    rais
    -0.14
    ê±°ëŀĺ
    -0.14
    nob
    -0.14
    probably
    -0.14
    inds
    -0.13
    utron
    -0.13
    ut
    -0.13
    iable
    -0.13
    POSITIVE LOGITS
     ever
    0.24
     haven
    0.24
     hasn
    0.21
     Haven
    0.19
     hadn
    0.19
     somehow
    0.17
    haven
    0.17
    essel
    0.17
    squ
    0.16
     EVER
    0.16
    Act Density 0.046%

    No Known Activations