INDEX
    Explanations

    the name "Brad" with varying activations

    the repeated mention of the name "Brad."

    New Auto-Interp
    Negative Logits
     referen
    -0.70
     eleph
    -0.69
    ktop
    -0.68
     subsistence
    -0.67
    VALUE
    -0.66
     derog
    -0.66
    phis
    -0.65
     versa
    -0.64
    Magikarp
    -0.63
     wiret
    -0.63
    POSITIVE LOGITS
    shaw
    1.20
    enton
    1.13
     Pitt
    1.01
    ford
    0.96
     Brad
    0.89
    iago
    0.89
    street
    0.85
    anche
    0.83
    nan
    0.83
    bury
    0.82
    Act Density 0.011%

    No Known Activations