INDEX
    Explanations

    mentions of the name "Joe" followed by a numerical activation value

    the repeated mention of the name "Joe."

    New Auto-Interp
    Negative Logits
    NESS
    -0.91
    hips
    -0.78
    rawdownloadcloneembedreportprint
    -0.77
    ties
    -0.70
    ample
    -0.68
    ancy
    -0.67
    ioned
    -0.65
    imental
    -0.65
    peed
    -0.64
    seeing
    -0.63
    POSITIVE LOGITS
     Biden
    1.10
     Arpaio
    1.07
     Rog
    0.91
     Russo
    0.88
     Pes
    0.87
     Scarborough
    0.87
     Camel
    0.85
     Gibbs
    0.84
     Rao
    0.82
    ppo
    0.80
    Act Density 0.033%

    No Known Activations