INDEX
Explanations
expressions indicating outcomes or transitions in experiences
New Auto-Interp
Negative Logits
elman
-0.17
iele
-0.17
836
-0.15
eland
-0.14
oda
-0.14
fold
-0.14
amt
-0.14
Bindings
-0.13
ate
-0.13
Ø©
-0.13
POSITIVE LOGITS
urons
0.17
خذ
0.16
änn
0.16
ervo
0.16
ardin
0.15
rior
0.15
aits
0.15
lashes
0.14
½Ķ
0.14
.Method
0.14
Activations Density 0.224%