INDEX
    Explanations

    mentions of prestigious awards, specifically the Nobel Prize

    references to prestigious awards, particularly the Nobel Prize and the Pulitzer Prize

    New Auto-Interp
    Negative Logits
    den
    -0.82
    rir
    -0.72
    icago
    -0.66
    alter
    -0.65
    strong
    -0.65
    nes
    -0.63
    aturdays
    -0.63
    icas
    -0.61
    ocal
    -0.61
    pher
    -0.61
    POSITIVE LOGITS
     Prize
    1.42
     laureate
    1.35
     prize
    1.07
     Winners
    1.06
     Winner
    1.05
     prizes
    0.99
     awarded
    0.95
     award
    0.92
    Winner
    0.92
     winner
    0.90
    Act Density 0.007%

    No Known Activations