INDEX
    Explanations

    names related to politics and academia

    mentions of specific names and the word "socks."

    New Auto-Interp
    Negative Logits
    thal
    -0.87
    med
    -0.82
     joints
    -0.78
    zon
    -0.71
    lda
    -0.68
    headed
    -0.67
    pton
    -0.67
     joint
    -0.66
    gary
    -0.65
    ibur
    -0.64
    POSITIVE LOGITS
    imental
    0.91
    ivities
    0.85
    ipation
    0.82
    orship
    0.79
     Davies
    0.78
    ilon
    0.76
    ieri
    0.75
    iar
    0.73
    atsuki
    0.72
    rolet
    0.72
    Act Density 0.039%

    No Known Activations