INDEX
Explanations
terms related to family relationships, particularly those denoting maternal and paternal connections
New Auto-Interp
Negative Logits
patch
-0.15
Livingston
-0.14
foy
-0.14
261
-0.14
ibe
-0.14
onom
-0.13
Cout
-0.13
«a
-0.13
Patton
-0.13
chner
-0.13
POSITIVE LOGITS
isman
0.16
Frid
0.15
ukan
0.15
wash
0.15
timeofday
0.15
Rnd
0.14
Townsend
0.14
ilia
0.14
ieres
0.14
Premium
0.14
Activations Density 0.009%