INDEX
    Explanations

    names of political figures

    New Auto-Interp
    Negative Logits
    bed
    -0.73
    bing
    -0.71
    stress
    -0.66
    cki
    -0.65
     Cascade
    -0.63
    guided
    -0.63
    PRESS
    -0.60
    unity
    -0.59
    ending
    -0.58
    ffer
    -0.57
    POSITIVE LOGITS
    iage
    1.00
    ials
    0.97
    ians
    0.95
    iments
    0.94
    igans
    0.94
    iants
    0.92
    iating
    0.92
    ial
    0.91
    iane
    0.91
    teenth
    0.89
    Act Density 0.183%

    No Known Activations