INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ‌خ
    -0.08
    -it
    -0.07
    'am
    -0.07
    (main
    -0.07
    (orig
    -0.06
    рес
    -0.06
     ра
    -0.06
    (VALUE
    -0.06
    nero
    -0.06
    (equal
    -0.06
    POSITIVE LOGITS
    overall
    0.07
     çalışmalar
    0.06
     державного
    0.06
    -produ
    0.06
    .Company
    0.06
    ehir
    0.06
    0.06
     lesbian
    0.06
     timp
    0.06
     зада
    0.06
    Act Density 0.042%

    No Known Activations