INDEX
    Explanations

    indications of negation and uncertainty in statements

    New Auto-Interp
    Negative Logits
    <bos>
    -2.28
     intersper
    -1.09
     disambigu
    -0.94
     intermitt
    -0.86
     unsus
    -0.85
     intrigu
    -0.84
     endow
    -0.83
     unspeak
    -0.82
     ineffec
    -0.81
     guil
    -0.81
    POSITIVE LOGITS
     signora
    1.05
     sorella
    1.03
     vacanza
    0.98
     paradiso
    0.93
     preghi
    0.92
     sfera
    0.90
     Muhamma
    0.89
     dott
    0.89
     muna
    0.87
    ">/
    0.86
    Act Density 0.325%

    No Known Activations