INDEX
    Explanations

    names specifically "Simon"

    repeated mentions of the name "Simon."

    New Auto-Interp
    Negative Logits
    ttes
    -0.68
    een
    -0.67
    ingo
    -0.64
    ITNESS
    -0.61
    ations
    -0.59
    laws
    -0.57
    reek
    -0.57
    tub
    -0.56
    tc
    -0.56
    utic
    -0.55
    POSITIVE LOGITS
     Says
    0.92
    etta
    0.88
    etti
    0.85
     Gerr
    0.84
    ãĤ¨ãĥ«
    0.76
    ovic
    0.73
    displayText
    0.73
    itars
    0.70
    idis
    0.70
     Fraser
    0.70
    Act Density 0.037%

    No Known Activations