INDEX
Explanations
proper names, particularly those associated with individuals or characters
New Auto-Interp
Negative Logits
onda
-0.17
uju
-0.15
formats
-0.15
kims
-0.14
utto
-0.14
uky
-0.14
(;;)
-0.14
uja
-0.14
etc
-0.14
cie
-0.14
POSITIVE LOGITS
captain
0.23
yellow
0.22
capt
0.20
_yellow
0.19
Yellow
0.19
substitute
0.19
Unused
0.19
Yellow
0.19
undle
0.19
captains
0.19
Activations Density 0.017%