INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Jez
    -0.08
     cosmet
    -0.08
     costumes
    -0.08
     Fortaleza
    -0.08
     Beirut
    -0.07
     weapon
    -0.07
     Weapons
    -0.07
     moda
    -0.07
     Unlike
    -0.07
     lez
    -0.07
    POSITIVE LOGITS
    0.10
     أم
    0.08
    Article
    0.08
    каун
    0.08
     кто
    0.08
     مخالف
    0.08
     indig
    0.08
    rai
    0.08
    ANS
    0.08
    чилик
    0.07
    Act Density 0.001%

    No Known Activations