INDEX
Explanations
expressions of indifference or concern in relationships
New Auto-Interp
Negative Logits
aled
-0.18
rok
-0.15
edia
-0.15
dum
-0.15
positor
-0.14
å¥ı
-0.14
ignon
-0.14
ulle
-0.14
caling
-0.14
agn
-0.14
POSITIVE LOGITS
about
0.22
cared
0.22
cares
0.20
lessly
0.18
tentang
0.18
caring
0.17
cter
0.17
éĹ²
0.16
about
0.15
ycl
0.15
Activations Density 0.018%