INDEX
    Explanations

    proper nouns related to politics and personalities

    mentions of specific individuals or entities linked to a narrative

    New Auto-Interp
    Negative Logits
    ocal
    -0.81
    utical
    -0.79
    ises
    -0.77
    icate
    -0.74
    UAL
    -0.73
    ual
    -0.73
    icals
    -0.72
    iary
    -0.72
     Mehran
    -0.71
    ically
    -0.70
    POSITIVE LOGITS
    bench
    0.77
    noon
    0.75
    pole
    0.74
    Stack
    0.74
    tons
    0.73
    tail
    0.72
    butt
    0.72
    mie
    0.72
    bra
    0.71
    ben
    0.71
    Act Density 0.028%

    No Known Activations