INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chars
    -0.08
    Chi
    -0.08
     charisma
    -0.08
     bustle
    -0.08
    Maximum
    -0.08
    plat
    -0.08
     máximo
    -0.07
     firms
    -0.07
    Firm
    -0.07
     Jade
    -0.07
    POSITIVE LOGITS
     envoyer
    0.09
     Gewalt
    0.09
    \Response
    0.09
     الإرهاب
    0.09
     भेज
    0.08
     allé
    0.08
     termasuk
    0.08
     geweld
    0.08
     violences
    0.08
    تو
    0.08
    Act Density 0.002%

    No Known Activations