INDEX
Explanations
actions related to planning and decision-making
New Auto-Interp
Negative Logits
ниÑħ
-0.18
(es
-0.16
ellas
-0.16
ellos
-0.15
opis
-0.14
нее
-0.14
dy
-0.14
nik
-0.13
gratuiti
-0.13
å¾
-0.13
POSITIVE LOGITS
la
0.25
la
0.21
lah
0.20
ankan
0.19
el
0.19
un
0.18
les
0.18
le
0.18
una
0.17
-la
0.17
Activations Density 0.055%