INDEX
Explanations
negations or words indicating denial
New Auto-Interp
Negative Logits
feroit
-0.91
avoient
-0.88
auroit
-0.82
igång
-0.82
étoient
-0.79
présidenti
-0.78
Monfieur
-0.77
Chriftian
-0.77
Chriſt
-0.76
kullanılır
-0.74
POSITIVE LOGITS
not
1.30
não
1.13
לא
1.07
Não
1.05
не
1.03
Not
0.98
tidak
0.97
Não
0.97
không
0.96
δεν
0.95
Activations Density 0.022%