INDEX
    Explanations

    mentions of specific individuals by name combined with a position or title

    the presence of names that start with "Ade."

    New Auto-Interp
    Negative Logits
    ipeg
    -0.85
    ivity
    -0.80
    urers
    -0.76
    lessness
    -0.70
    enegger
    -0.66
    orem
    -0.66
    sburgh
    -0.66
    urally
    -0.66
    iew
    -0.66
    imation
    -0.66
    POSITIVE LOGITS
    lled
    0.92
    cki
    0.85
    llan
    0.82
    vice
    0.79
    lli
    0.78
    lla
    0.78
    hani
    0.76
    utic
    0.73
    aways
    0.72
    hyde
    0.72
    Act Density 0.040%

    No Known Activations