INDEX
    Explanations

    proper nouns related to individuals, particularly ones named Alexander

    New Auto-Interp
    Negative Logits
    neys
    -0.92
    zee
    -0.88
    kered
    -0.83
    eling
    -0.82
    rosse
    -0.80
    atical
    -0.78
    employment
    -0.77
    eless
    -0.77
    ths
    -0.75
    elling
    -0.74
    POSITIVE LOGITS
     Gust
    0.94
     Wang
    0.83
     Luthor
    0.81
     Cock
    0.80
    opoulos
    0.77
     Hamilton
    0.75
     Payne
    0.74
     Calder
    0.71
     Anton
    0.71
     Graham
    0.70
    Act Density 0.014%

    No Known Activations