INDEX
Explanations
references to personal upbringing and childhood experiences
New Auto-Interp
Negative Logits
",__
-0.14
iasm
-0.13
eturn
-0.13
agina
-0.13
ãĥIJãĥ¼
-0.13
.yang
-0.12
ypy
-0.12
704
-0.12
夫
-0.12
owl
-0.12
POSITIVE LOGITS
surrounded
0.27
idol
0.23
hearing
0.22
raised
0.22
exposed
0.21
immersed
0.21
amid
0.21
poor
0.21
amidst
0.21
bilingual
0.20
Activations Density 0.037%