INDEX
    Explanations

    proper nouns, particularly names such as "Susan."

    mentions of the name "Susan"

    New Auto-Interp
    Negative Logits
    ORD
    -0.80
    iculty
    -0.78
    cffffcc
    -0.72
    unct
    -0.70
    riter
    -0.65
    olkien
    -0.64
    erent
    -0.64
    psey
    -0.64
     warped
    -0.64
    ebus
    -0.62
    POSITIVE LOGITS
    ne
    0.97
    icide
    0.97
    gha
    0.96
    jit
    0.86
    anne
    0.86
    ja
    0.85
    icides
    0.80
    atan
    0.79
    otte
    0.79
    mination
    0.78
    Act Density 0.033%

    No Known Activations