INDEX
Explanations
phrases related to repeated actions and purchasing decisions
New Auto-Interp
Negative Logits
once
-0.26
once
-0.23
Once
-0.20
Once
-0.20
onces
-0.19
finally
-0.19
latest
-0.18
/latest
-0.17
onse
-0.17
_once
-0.17
POSITIVE LOGITS
gain
0.56
ag
0.53
gain
0.48
Gain
0.46
Gain
0.43
.ag
0.39
again
0.39
_gain
0.38
Ag
0.36
gains
0.36
Activations Density 0.071%