INDEX
    Explanations

    descriptions related to news or current events

    references to specific political figures and related events

    New Auto-Interp
    Negative Logits
    @#&
    -0.81
    politics
    -0.70
    learn
    -0.69
    profits
    -0.69
    needed
    -0.67
    ventions
    -0.66
     manufact
    -0.65
    idays
    -0.65
    Prem
    -0.64
     Pwr
    -0.63
    POSITIVE LOGITS
     kneeling
    1.29
     caption
    1.20
     silhou
    1.16
     grinning
    1.15
     smiling
    1.15
     silhouette
    1.11
     flanked
    1.08
     handcuffed
    1.00
     purportedly
    0.98
     decap
    0.98
    Act Density 0.404%

    No Known Activations