INDEX
    Explanations

    phrases indicating a conclusion or outcome

    New Auto-Interp
    Negative Logits
    /from
    -0.15
    ̣
    -0.14
     Desde
    -0.14
    å£
    -0.14
    /of
    -0.14
    иÑĩеÑģки
    -0.14
    otton
    -0.13
    ouro
    -0.13
    Hub
    -0.13
     از
    -0.13
    POSITIVE LOGITS
     needing
    0.24
     being
    0.21
     having
    0.20
     with
    0.19
     feeling
    0.18
     spending
    0.17
     face
    0.17
     on
    0.17
     falling
    0.16
     somewhere
    0.16
    Act Density 0.030%

    No Known Activations