INDEX
Explanations
phrases that indicate a progression or timeline
New Auto-Interp
Negative Logits
uchs
-0.17
коз
-0.15
Protest
-0.14
INGER
-0.14
ize
-0.14
oppers
-0.14
ser
-0.14
isen
-0.14
Impossible
-0.14
INNER
-0.14
POSITIVE LOGITS
buildup
0.21
build
0.20
run
0.19
Build
0.18
into
0.17
Run
0.17
run
0.16
Run
0.16
-build
0.16
upto
0.16
Activations Density 0.010%