INDEX
Explanations
phrases related to direction and progress towards a goal
New Auto-Interp
Negative Logits
Calvo
-0.83
foglal
-0.71
pep
-0.70
5
-0.70
물
-0.68
печа
-0.66
t
-0.65
fatica
-0.65
T
-0.64
'
-0.64
POSITIVE LOGITS
toward
1.94
toward
1.89
Toward
1.88
towards
1.84
Towards
1.82
Towards
1.77
towards
1.74
Toward
1.73
hacia
1.33
envers
1.25
Activations Density 0.052%