INDEX
    Explanations

    complex statements expressing disagreement or uncertainty

    negations or conditions

    New Auto-Interp
    Negative Logits
    :✨
    -0.71
     autorytatywna
    -0.70
     виправивши
    -0.69
    ArgsConstructor
    -0.66
     &___
    -0.65
     ویکی‌پدی
    -0.63
    AndEndTag
    -0.60
    IUrlHelper
    -0.59
    ChildScrollView
    -0.58
    WebServlet
    -0.58
    POSITIVE LOGITS
     preguntar
    0.34
     tocar
    0.34
     ahorrar
    0.33
     derajat
    0.33
     taken
    0.33
     schlagen
    0.32
     frère
    0.32
     modifikasi
    0.32
    gafas
    0.31
     acusado
    0.31
    Act Density 0.186%

    No Known Activations