INDEX
Explanations
references to growing up or childhood
New Auto-Interp
Negative Logits
ibe
-0.17
wid
-0.17
grown
-0.17
phas
-0.15
eturn
-0.15
Growing
-0.14
ankind
-0.14
Aging
-0.14
apon
-0.14
igham
-0.13
POSITIVE LOGITS
knowing
0.24
surrounded
0.22
poor
0.22
hearing
0.22
near
0.20
around
0.19
watching
0.18
listening
0.18
alongside
0.17
attending
0.17
Activations Density 0.022%