INDEX
Explanations
references to instincts and natural responses in behavioral contexts
New Auto-Interp
Negative Logits
ITO
-0.15
ãģĭãģ«
-0.14
.rm
-0.14
uzzi
-0.14
iphy
-0.14
zel
-0.14
彦
-0.14
aura
-0.14
acion
-0.14
remot
-0.14
POSITIVE LOGITS
instinct
0.22
naturally
0.20
natural
0.19
natural
0.18
instincts
0.18
天
0.17
inst
0.16
innate
0.15
natur
0.15
Natural
0.15
Activations Density 0.131%