INDEX
Explanations
references to upbringing or childhood experiences
New Auto-Interp
Negative Logits
apon
-0.18
ibe
-0.17
apor
-0.17
ias
-0.16
706
-0.15
adies
-0.14
wid
-0.14
Harris
-0.14
chalk
-0.14
ñana
-0.14
POSITIVE LOGITS
surrounded
0.19
hearing
0.18
knowing
0.18
privileged
0.16
watching
0.15
seeing
0.15
ENSE
0.15
podob
0.14
seedu
0.14
associ
0.14
Activations Density 0.018%