INDEX
    Explanations

    references to political claims and investigations

    New Auto-Interp
    Negative Logits
    isu
    -0.15
    ä»®
    -0.14
    borg
    -0.14
    alem
    -0.14
     Vogue
    -0.14
    assin
    -0.14
    hart
    -0.13
     Queens
    -0.13
     fashion
    -0.13
    rag
    -0.13
    POSITIVE LOGITS
     Pants
    0.18
     verdad
    0.17
    жд
    0.17
    ersonic
    0.16
    mbH
    0.15
    sonian
    0.15
    mites
    0.15
    _fact
    0.15
    yn
    0.14
     Accuracy
    0.14
    Act Density 0.006%

    No Known Activations