INDEX
    Explanations

    the word "Me" with varying degrees of emphasis indicated by activation strength

    the repetition of the word "Me"

    New Auto-Interp
    Negative Logits
    UAL
    -0.72
    ctl
    -0.66
    icably
    -0.65
    flush
    -0.65
    acing
    -0.64
     flush
    -0.63
    ulative
    -0.62
    ript
    -0.62
    itiveness
    -0.62
    OWER
    -0.60
    POSITIVE LOGITS
     Me
    3.48
    Me
    2.45
     ME
    2.04
    me
    1.77
     Us
    1.49
     Meh
    1.41
    ME
    1.36
     Him
    1.33
     My
    1.27
     me
    1.24
    Act Density 0.012%

    No Known Activations