INDEX
Explanations
phrases indicating negation or exclusion
not or no usage
New Auto-Interp
Negative Logits
featureID
-0.28
algumas
-0.27
nød
-0.27
coming
-0.26
half
-0.26
IVEREF
-0.26
manque
-0.25
sometimes
-0.25
खु
-0.25
coming
-0.24
POSITIVE LOGITS
kasarigan
0.72
Tembelea
0.69
surla
0.66
IntoConstraints
0.64
>*/
0.62
GraphicsUnit
0.61
cherchés
0.60
uLocal
0.59
⮕
0.57
tagHelperRunner
0.57
Activations Density 0.177%