INDEX
    Explanations

    expressions of disappointment and questioning statements

    Follows commas or periods

    disagreement or rejection

    New Auto-Interp
    Negative Logits
    <bos>
    -0.65
    ніципа
    -0.54
     defaultstate
    -0.48
    aspectj
    -0.46
     spô
    -0.45
     altında
    -0.44
     způ
    -0.40
    conexao
    -0.40
     długość
    -0.40
    restes
    -0.40
    POSITIVE LOGITS
     WRONG
    1.19
     Nope
    1.09
     wrong
    1.08
     Wrong
    1.07
    WRONG
    1.05
    wrong
    1.02
     nope
    0.99
    nope
    0.98
    Nope
    0.96
    Wrong
    0.94
    Act Density 0.206%

    No Known Activations