INDEX
Explanations
negations in various forms
New Auto-Interp
Negative Logits
culturelles
-0.69
}\
-0.67
jeter
-0.67
Mek
-0.66
publiques
-0.65
ceci
-0.63
respectively
-0.63
WireFormatLite
-0.63
suivantes
-0.62
prit
-0.61
POSITIVE LOGITS
no
1.64
No
1.59
No
1.42
NO
1.42
no
1.13
NO
1.13
nof
1.06
Noyes
1.05
noOf
1.03
sno
1.01
Activations Density 0.158%