INDEX
    Explanations

    elements that express positivity and supportiveness

    New Auto-Interp
    Negative Logits
     indeed
    -0.24
     only
    -0.21
     atleast
    -0.20
    only
    -0.20
     именно
    -0.19
     quite
    -0.19
     both
    -0.18
     neither
    -0.18
     BOTH
    -0.18
     ONLY
    -0.18
    POSITIVE LOGITS
     plain
    0.23
     thôi
    0.23
    ifiable
    0.22
    ifi
    0.20
    ifying
    0.20
    IFI
    0.19
    æĻ®éĢļ
    0.18
    plain
    0.18
     Plain
    0.18
    vailability
    0.17
    Act Density 0.167%

    No Known Activations