INDEX
    Explanations

    references to political actions and beliefs

    New Auto-Interp
    Negative Logits
    \-
    -0.59
     infamous
    -0.58
     unsuccessfully
    -0.56
    secret
    -0.56
     nicknamed
    -0.55
    Downloadha
    -0.54
    irin
    -0.53
    unknown
    -0.52
    amon
    -0.52
    cheon
    -0.52
    POSITIVE LOGITS
     ASAP
    0.85
     transparency
    0.83
     accountability
    0.83
     accountable
    0.79
     unbiased
    0.79
     sooner
    0.79
    respect
    0.79
     honest
    0.78
    cknow
    0.78
     decency
    0.77
    Act Density 0.963%

    No Known Activations