INDEX
    Explanations

    words related to controversial or important topics

    references to specific problems or challenges

    New Auto-Interp
    Negative Logits
    urses
    -0.85
    zin
    -0.84
    bsite
    -0.80
    inav
    -0.77
    ondon
    -0.77
    alt
    -0.76
    thood
    -0.74
    ancies
    -0.74
    htaking
    -0.73
    ellow
    -0.73
    POSITIVE LOGITS
     issue
    0.99
    Issue
    0.89
    naires
    0.82
     Issue
    0.82
    DonaldTrump
    0.81
     HRC
    0.77
     Issues
    0.76
     issues
    0.73
     plag
    0.73
    Iss
    0.69
    Act Density 0.034%

    No Known Activations