INDEX
    Explanations

    mentions of specific names or entities in news articles

    proper nouns, specifically names of individuals and entities

    New Auto-Interp
    Negative Logits
    otine
    -0.84
    izations
    -0.84
    illian
    -0.79
    ivals
    -0.79
    ians
    -0.76
    urgy
    -0.76
    icked
    -0.75
    ais
    -0.73
    ous
    -0.73
     Pengu
    -0.72
    POSITIVE LOGITS
     Dee
    1.00
    pling
    0.92
    zie
    0.89
    bris
    0.86
    gradation
    0.83
    lde
    0.82
    ples
    0.80
    velop
    0.78
    plin
    0.78
    ble
    0.77
    Act Density 0.043%

    No Known Activations