INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    featureID
    -0.54
     @"/
    -0.43
    PerformLayout
    -0.40
     Jurí
    -0.40
     arşivlendi
    -0.39
    paravant
    -0.39
    TargetApi
    -0.38
     Económica
    -0.38
     Artículos
    -0.38
     pleaſure
    -0.37
    POSITIVE LOGITS
    而非
    0.77
    而不是
    0.70
     وليس
    0.62
     rather
    0.61
    Rather
    0.54
     bukan
    0.53
     Rather
    0.52
     вместо
    0.50
     statt
    0.50
     piuttosto
    0.50
    Act Density 0.193%

    No Known Activations