INDEX
    Explanations

    phrases related to political figures and events

    mentions of specific organizations, news agencies, or entities related to media

    New Auto-Interp
    Negative Logits
    osponsors
    -0.72
    taboola
    -0.70
     differe
    -0.63
    abor
    -0.62
    ordinate
    -0.60
    abet
    -0.59
     cumbers
    -0.59
    ité
    -0.59
    ospons
    -0.58
     anat
    -0.58
    POSITIVE LOGITS
     11
    1.02
     41
    1.01
     19
    1.00
     31
    0.99
     17
    0.98
     23
    0.98
     39
    0.98
     29
    0.97
     21
    0.97
     27
    0.96
    Act Density 0.044%

    No Known Activations