INDEX
    Explanations

    direct questions and inquiries

    questions that begin with "if" or "whether."

    New Auto-Interp
    Negative Logits
    abal
    -0.90
    alde
    -0.77
    ulic
    -0.68
    thodox
    -0.66
    nown
    -0.65
    Bonus
    -0.65
    astered
    -0.64
    *=-
    -0.61
    tc
    -0.61
    won
    -0.60
    POSITIVE LOGITS
    ĻĤ
    0.83
    amera
    0.83
     forgiveness
    0.82
     curfew
    0.71
     permission
    0.70
     mosqu
    0.70
    ihad
    0.68
     displeasure
    0.68
    asking
    0.67
     watering
    0.66
    Act Density 0.078%

    No Known Activations