INDEX
Explanations
phrases describing age or childhood experiences
New Auto-Interp
Negative Logits
afort
-0.16
adder
-0.16
ument
-0.16
flix
-0.15
quier
-0.15
ofi
-0.14
pari
-0.14
ÑĪев
-0.14
ws
-0.14
kont
-0.14
POSITIVE LOGITS
kid
0.28
teenager
0.27
child
0.26
boy
0.21
child
0.21
impression
0.20
kids
0.19
teens
0.19
teenagers
0.19
kid
0.19
Activations Density 0.041%