INDEX
Explanations
phrases emphasizing relationships and characteristics of individuals
New Auto-Interp
Negative Logits
anything
-0.15
enes
-0.14
ishment
-0.14
dration
-0.14
agnostics
-0.14
simp
-0.14
selves
-0.13
åĢĭ人
-0.13
лÑİбой
-0.13
bsp
-0.13
POSITIVE LOGITS
whom
0.24
connections
0.21
ties
0.20
interests
0.20
plans
0.19
dreams
0.19
ambitions
0.19
aspirations
0.18
Down
0.18
roots
0.16
Activations Density 0.288%