INDEX
    Explanations

    positive responses or agreements

    affirmative responses to questions or statements

    New Auto-Interp
    Negative Logits
    sing
    -0.77
    drawn
    -0.66
    abal
    -0.62
     reverted
    -0.61
    ded
    -0.59
    ylan
    -0.58
     mourning
    -0.56
     sling
    -0.56
     bearer
    -0.56
    ined
    -0.55
    POSITIVE LOGITS
    terday
    1.03
     sir
    0.98
     Absolutely
    0.79
    YES
    0.76
    Absolutely
    0.76
    Answer
    0.75
     yes
    0.74
     Nope
    0.72
    !,
    0.72
    yes
    0.70
    Act Density 0.111%

    No Known Activations