INDEX
Explanations
future actions or recommendations
New Auto-Interp
Negative Logits
further
1.04
Further
0.93
Further
0.92
进一步
0.89
дальше
0.84
further
0.80
dissatisfaction
0.79
farther
0.78
onward
0.77
proceed
0.77
POSITIVE LOGITS
return
1.23
return
1.22
Return
1.11
Return
1.10
返回
1.01
Returning
0.99
returning
0.99
returning
0.98
Returning
0.98
RETURN
0.97
Activations Density 0.028%