INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     MR
    -0.07
    Por
    -0.06
    _br
    -0.06
    America
    -0.06
    -0.06
     inflammatory
    -0.06
    _BOOK
    -0.06
     sant
    -0.06
     Por
    -0.06
     hvor
    -0.06
    POSITIVE LOGITS
    нося
    0.07
     Lottery
    0.07
    	can
    0.06
    структор
    0.06
     domaine
    0.06
     Katrina
    0.06
    ��
    0.06
     interacts
    0.06
    不存在
    0.06
    orestation
    0.06
    Act Density 0.014%

    No Known Activations