INDEX
Explanations
references to interpersonal relationships and human interactions
New Auto-Interp
Negative Logits
celik
-0.17
nh
-0.16
spender
-0.15
nell
-0.14
AFX
-0.14
insky
-0.14
lements
-0.14
velt
-0.14
ано
-0.14
eneral
-0.14
POSITIVE LOGITS
axter
0.16
ailles
0.16
orman
0.15
Dit
0.15
volumes
0.14
akit
0.14
elyn
0.14
олж
0.13
ixer
0.13
prepared
0.13
Activations Density 0.974%