INDEX
Explanations
some [entity/group/opinion]
New Auto-Interp
Negative Logits
-
0.81
and
0.79
/
0.78
to
0.75
a
0.74
chnung
0.71
the
0.71
by
0.71
corr
0.67
/
0.66
POSITIVE LOGITS
те
0.82
者は
0.78
kanë
0.78
티
0.77
ième
0.77
t
0.76
ુ
0.75
integrantes
0.72
ých
0.72
människor
0.71
Activations Density 0.501%