INDEX
    Explanations

    instances of the words "tell" and "told."

    New Auto-Interp
    Negative Logits
    езд
    -0.16
    ial
    -0.15
    olar
    -0.15
    ic
    -0.14
    esters
    -0.14
    bole
    -0.14
    zu
    -0.14
    utt
    -0.14
     Overrides
    -0.14
    estr
    -0.13
    POSITIVE LOGITS
     tales
    0.24
     stories
    0.22
    ingly
    0.21
     tale
    0.21
     us
    0.20
     fortunes
    0.19
     told
    0.19
     lies
    0.18
     me
    0.18
     tell
    0.17
    Act Density 0.045%

    No Known Activations