INDEX
Explanations
phrases indicating a conclusion or outcome
New Auto-Interp
Negative Logits
/from
-0.15
Ì£
-0.14
Desde
-0.14
å£
-0.14
/of
-0.14
иÑĩеÑģки
-0.14
otton
-0.13
ouro
-0.13
Hub
-0.13
از
-0.13
POSITIVE LOGITS
needing
0.24
being
0.21
having
0.20
with
0.19
feeling
0.18
spending
0.17
face
0.17
on
0.17
falling
0.16
somewhere
0.16
Activations Density 0.030%