INDEX
Explanations
references to familial and close relationships
New Auto-Interp
Negative Logits
inar
-0.16
umpy
-0.15
yat
-0.15
.jasper
-0.14
bud
-0.14
buch
-0.14
aning
-0.14
wich
-0.14
warf
-0.14
finity
-0.13
POSITIVE LOGITS
Nack
0.15
_keeper
0.15
Alman
0.14
sucker
0.14
ORB
0.13
alam
0.13
592
0.13
rž
0.13
/misc
0.13
contr
0.13
Activations Density 0.061%