INDEX
Explanations
interactions involving familial relationships and emotional exchanges
New Auto-Interp
Negative Logits
antom
-0.17
pag
-0.17
amer
-0.16
ose
-0.15
vyk
-0.15
unci
-0.14
roky
-0.14
unc
-0.14
illac
-0.14
алеж
-0.14
POSITIVE LOGITS
©
0.16
º
0.14
ober
0.14
лÑİÑĩа
0.13
iser
0.13
umuz
0.13
ysz
0.13
ifix
0.13
[".
0.13
Helm
0.13
Activations Density 0.333%