INDEX
Explanations
days of the week specified in the format "Wed" with varying levels of activations
references to specific days of the week, particularly Wednesday
New Auto-Interp
Negative Logits
Reward
-0.71
externalActionCode
-0.71
ariat
-0.66
OUP
-0.66
OST
-0.65
Legions
-0.65
Profile
-0.64
ANK
-0.64
ustomed
-0.64
ère
-0.63
POSITIVE LOGITS
nesday
1.85
gew
0.96
ding
0.93
rox
0.91
Wed
0.90
een
0.87
roxy
0.82
gement
0.81
gie
0.79
lock
0.78
Activations Density 0.011%