INDEX
Explanations
phrases indicating ongoing actions or states
New Auto-Interp
Negative Logits
[*]
-0.52
மான
-0.50
oden
-0.47
tra
-0.47
dar
-0.46
derecho
-0.46
福利
-0.45
anger
-0.45
gub
-0.45
wsze
-0.44
POSITIVE LOGITS
estekak
0.82
LookAnd
0.79
Wikimedijinoj
0.71
Hauptartikel
0.70
propOrder
0.69
pescoço
0.67
ItemBackground
0.67
WireFormatLite
0.67
complexContent
0.65
setVerticalGroup
0.65
Activations Density 0.545%