INDEX
    Explanations

    titles and roles of officials, especially those related to government or political positions

    New Auto-Interp
    Negative Logits
    orro
    -0.17
    stead
    -0.17
    enburg
    -0.15
    APS
    -0.15
    st
    -0.15
    ÙħÙĪØ¯
    -0.15
    ilt
    -0.15
    ternet
    -0.15
     detriment
    -0.14
    jun
    -0.14
    POSITIVE LOGITS
    noop
    0.16
    ignum
    0.15
    resp
    0.15
    æ¸Ī
    0.14
    linger
    0.14
    ibel
    0.14
    ibe
    0.14
     minded
    0.14
    _bw
    0.14
    auc
    0.14
    Act Density 0.019%

    No Known Activations