INDEX
Explanations
assertive language related to personal experiences and responsibilities
New Auto-Interp
Negative Logits
meille
-0.79
nakalista
-0.76
honom
-0.75
avoient
-0.66
THEY
-0.66
auroit
-0.66
Wikimedijinoj
-0.65
它
-0.64
OfThe
-0.63
varandra
-0.63
POSITIVE LOGITS
be
0.51
Geraadpleegd
0.49
herre
0.49
ensure
0.49
"?>
0.47
want
0.46
0.45
UCE
0.45
i
0.45
both
0.44
Activations Density 0.332%