INDEX
Explanations
phrases indicating direction or movement towards a goal or endpoint
New Auto-Interp
Negative Logits
Calvo
-0.84
foglal
-0.72
fatica
-0.68
poffe
-0.67
물
-0.67
5
-0.67
T
-0.65
Bede
-0.63
ыре
-0.62
riuscito
-0.62
POSITIVE LOGITS
toward
1.81
towards
1.77
Towards
1.75
Toward
1.75
toward
1.74
Towards
1.68
towards
1.63
Toward
1.61
hacia
1.25
envers
1.19
Activations Density 0.060%