INDEX
Explanations
phrases indicating agreement or consistency within various studies and results
New Auto-Interp
Negative Logits
Walkover
-0.53
},{
-0.46
transfieras
-0.38
tables
-0.37
脱
-0.37
jok
-0.36
noDo
-0.35
trời
-0.35
énario
-0.35
WHOLE
-0.35
POSITIVE LOGITS
conferma
0.66
confirms
0.65
confirming
0.62
confirm
0.60
confirmó
0.59
predicted
0.56
confirmación
0.56
expected
0.56
confirmar
0.56
potwier
0.54
Activations Density 1.092%