INDEX
Explanations
actions and interactions involving children and their play activities
New Auto-Interp
Negative Logits
hell
-0.16
damned
-0.15
damn
-0.15
fucking
-0.15
skyt
-0.15
åķ
-0.15
Fuck
-0.15
shit
-0.15
piel
-0.15
indsight
-0.14
POSITIVE LOGITS
mommy
0.27
lots
0.21
grown
0.18
ummy
0.17
Lots
0.17
pretend
0.17
bye
0.16
scary
0.16
eya
0.16
silly
0.16
Activations Density 0.111%