INDEX
Explanations
variations or distinctions in behavior or treatment among subjects
contrasting how things are done
New Auto-Interp
Negative Logits
=
-0.48
⇒
-0.42
an
-0.41
*
-0.38
chließ
-0.37
com
-0.37
=
-0.37
of
-0.36
0
-0.36
िल्म
-0.36
POSITIVE LOGITS
differently
1.80
anders
1.01
differentially
0.98
Differ
0.94
diffé
0.93
inaczej
0.91
differ
0.91
Differ
0.79
autrement
0.78
DIFFERENT
0.73
Activations Density 0.006%