INDEX
    Explanations

    instances of yes or no responses

    affirmative responses or expressions of agreement

    New Auto-Interp
    Negative Logits
    kefeller
    -0.70
    aign
    -0.60
    uese
    -0.59
    artney
    -0.57
    illin
    -0.56
    drawn
    -0.55
    agra
    -0.55
    riers
    -0.55
     Citiz
    -0.54
    gins
    -0.53
    POSITIVE LOGITS
     sir
    1.11
    !
    1.04
    .
    0.98
     Absolutely
    0.95
    Absolutely
    0.93
    !.
    0.91
    !!!
    0.86
    !!!!
    0.86
     yes
    0.85
    !!
    0.85
    Act Density 0.169%

    No Known Activations