INDEX
    Explanations

    the occurrence of the word "you" in various contexts

    New Auto-Interp
    Negative Logits
    lover
    -0.16
    ower
    -0.15
    board
    -0.15
    ald
    -0.15
    mus
    -0.15
    ict
    -0.15
    imit
    -0.14
    igate
    -0.14
    helper
    -0.14
    idious
    -0.14
    POSITIVE LOGITS
    erdale
    0.16
    ehir
    0.16
    ’ll
    0.16
    'll
    0.15
    OMUX
    0.15
    SELF
    0.15
    erer
    0.15
    ebek
    0.15
    oldemort
    0.14
    imli
    0.14
    Act Density 0.102%

    No Known Activations