INDEX
Explanations
interactive prompts and activities targeting children's language skills
New Auto-Interp
Negative Logits
è©ķ価
-0.18
FUCK
-0.17
fucking
-0.17
fucked
-0.17
fucks
-0.16
Fuck
-0.16
hell
-0.16
Ãły
-0.15
cazzo
-0.15
fuck
-0.15
POSITIVE LOGITS
umpt
0.19
lots
0.19
grown
0.17
opsy
0.16
mommy
0.16
Lots
0.15
humans
0.15
lots
0.15
upil
0.15
silly
0.15
Activations Density 0.267%