INDEX
    Explanations

    instances of the word "had."

    New Auto-Interp
    Negative Logits
    PI
    -0.67
    bery
    -0.66
    owe
    -0.65
    orph
    -0.64
    ethy
    -0.64
    ety
    -0.63
    âϦ
    -0.62
    bie
    -0.60
     anymore
    -0.59
    forward
    -0.59
    POSITIVE LOGITS
     been
    1.01
    iths
    1.01
     originally
    0.98
     previously
    0.97
     begun
    0.97
     hoped
    0.95
     undergone
    0.94
     initially
    0.83
     gotten
    0.82
     flown
    0.79
    Act Density 0.139%

    No Known Activations