INDEX
Explanations
mentions of children, family, and play-related activities
New Auto-Interp
Negative Logits
idak
-0.16
URN
-0.15
ucken
-0.15
Zá
-0.15
unan
-0.15
pubs
-0.14
üst
-0.14
otte
-0.14
oga
-0.14
ivel
-0.14
POSITIVE LOGITS
Barb
0.24
Leg
0.24
Thomas
0.23
Fisher
0.23
LEG
0.23
Dup
0.22
magna
0.21
trains
0.21
Ton
0.20
princess
0.20
Activations Density 0.181%