INDEX
Explanations
references to past experiences and transformations
New Auto-Interp
Negative Logits
someday
-0.15
ignon
-0.14
_NEXT
-0.14
later
-0.13
Later
-0.13
later
-0.13
Later
-0.12
obel
-0.12
ãĤ¤ãĤ¯
-0.12
дÑĥ
-0.12
POSITIVE LOGITS
prior
0.72
before
0.64
prior
0.58
Prior
0.56
previous
0.55
Prior
0.55
before
0.54
Before
0.52
BEFORE
0.52
Before
0.51
Activations Density 0.393%