INDEX
    Explanations

    negations or words indicating denial

    New Auto-Interp
    Negative Logits
     feroit
    -0.91
     avoient
    -0.88
     auroit
    -0.82
     igång
    -0.82
     étoient
    -0.79
     présidenti
    -0.78
     Monfieur
    -0.77
     Chriftian
    -0.77
     Chriſt
    -0.76
     kullanılır
    -0.74
    POSITIVE LOGITS
     not
    1.30
     não
    1.13
     לא
    1.07
     Não
    1.05
     не
    1.03
     Not
    0.98
     tidak
    0.97
    Não
    0.97
     không
    0.96
     δεν
    0.95
    Act Density 0.022%

    No Known Activations