INDEX
Explanations
themes related to anthropomorphism and emotional responses to characters and situations
New Auto-Interp
Negative Logits
Particip
-0.08
Req
-0.07
rary
-0.07
ream
-0.07
cec
-0.06
plode
-0.06
quential
-0.06
Intialized
-0.06
YD
-0.06
regn
-0.06
POSITIVE LOGITS
human
0.08
human
0.07
humanoid
0.07
-human
0.07
humans
0.06
near
0.06
.Interface
0.06
Human
0.06
Harmony
0.06
ίνα
0.06
Activations Density 0.000%