INDEX
Explanations
instances of the word "forward" and related terms indicating progress or advancement
New Auto-Interp
Negative Logits
hetto
-0.16
reluct
-0.15
eral
-0.15
utes
-0.14
kou
-0.14
imas
-0.14
shar
-0.14
plete
-0.14
Minute
-0.14
ks
-0.14
POSITIVE LOGITS
/back
0.27
wards
0.18
forward
0.18
QUIRES
0.18
-forward
0.18
-thinking
0.17
/down
0.17
warf
0.17
ward
0.17
forward
0.16
Activations Density 0.037%