INDEX
Explanations
references to family relationships and dynamics
New Auto-Interp
Negative Logits
rette
-0.16
елов
-0.15
amation
-0.14
adro
-0.14
šlo
-0.13
umbing
-0.13
urette
-0.13
ëŁŃ
-0.13
Dan
-0.13
ød
-0.13
POSITIVE LOGITS
ener
0.15
.hw
0.15
æĺŃ
0.15
etCode
0.14
otten
0.14
/moment
0.14
ocker
0.14
neighbor
0.14
%%%%%%%%%%%%%%%%
0.14
pq
0.14
Activations Density 0.002%