INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Wheels
    -0.08
     Marg
    -0.06
     Enough
    -0.06
     borrowing
    -0.06
     Буд
    -0.06
    npm
    -0.06
     قتل
    -0.06
     lies
    -0.06
    ора
    -0.06
    -0.06
    POSITIVE LOGITS
     sap
    0.07
    пра
    0.06
    abase
    0.06
     майже
    0.06
    ;a
    0.06
    prov
    0.06
    ेवल
    0.06
    ;c
    0.06
     Brooklyn
    0.06
     sex
    0.06
    Act Density 0.022%

    No Known Activations