INDEX
Explanations
phrases that indicate direction or progression towards a goal or outcome
New Auto-Interp
Negative Logits
Calvo
-0.84
foglal
-0.76
poffe
-0.70
riuscito
-0.69
fatica
-0.69
vandens
-0.67
5
-0.67
assoluto
-0.67
grunn
-0.66
물
-0.66
POSITIVE LOGITS
toward
1.91
towards
1.88
toward
1.86
Towards
1.84
Toward
1.83
Towards
1.79
towards
1.76
Toward
1.72
hacia
1.30
TOW
1.20
Activations Density 0.058%