INDEX
Explanations
phrases indicating new beginnings or significant firsts
New Auto-Interp
Negative Logits
chw
-0.15
agine
-0.15
hol
-0.15
chg
-0.15
anh
-0.15
ropa
-0.14
endar
-0.14
ard
-0.14
/logger
-0.14
inand
-0.14
POSITIVE LOGITS
uspend
0.15
iferay
0.14
eliac
0.14
że
0.13
enus
0.13
.dsl
0.13
_FOREACH
0.13
lb
0.13
ighton
0.13
183
0.12
Activations Density 0.336%