INDEX
    Explanations

    ambiguous or unclear words and phrases

    New Auto-Interp
    Negative Logits
     sidew
    -0.72
     heel
    -0.70
    hement
    -0.69
     intuitive
    -0.67
     defamation
    -0.67
     suppressed
    -0.66
     sway
    -0.66
     flush
    -0.66
     disg
    -0.65
     positively
    -0.64
    POSITIVE LOGITS
    lihood
    1.00
    Else
    0.93
    âĦ¢
    0.89
     Limits
    0.86
    tons
    0.84
    ness
    0.83
    itarian
    0.82
    !,
    0.80
    cott
    0.79
    sburg
    0.79
    Act Density 0.162%

    No Known Activations