INDEX
    Explanations

    mentions of the name "Don."

    New Auto-Interp
    Negative Logits
    View
    -0.67
    C
    -0.66
    In
    -0.64
    op
    -0.62
    <eos>
    -0.61
    de
    -0.61
    Before
    -0.60
    opo
    -0.59
    F
    -0.59
    The
    -0.58
    POSITIVE LOGITS
     Isn
    1.45
    Isn
    1.44
    doesn
    1.34
     Shouldn
    1.33
     Wasn
    1.31
     wouldn
    1.31
     Aren
    1.30
    Doesn
    1.30
     weren
    1.30
     shouldn
    1.29
    Act Density 0.156%

    No Known Activations