INDEX
Explanations
phrases indicating direction or intention
New Auto-Interp
Negative Logits
kasarigan
-0.64
MIC
-0.59
riscoll
-0.57
Vari
-0.56
Sole
-0.55
vaux
-0.55
Біографія
-0.54
Static
-0.52
Galle
-0.51
Inter
-0.51
POSITIVE LOGITS
towards
4.20
toward
4.12
towards
4.01
toward
3.90
Towards
3.79
Toward
3.70
Towards
3.63
Toward
3.29
hacia
2.73
envers
2.20
Activations Density 0.046%