INDEX
    Explanations

    questions starting with "Why do" or "Why are"

    questions and inquiries about motivations and reasons behind actions

    New Auto-Interp
    Negative Logits
    iHUD
    -0.83
    abase
    -0.74
    orage
    -0.74
    yssey
    -0.72
    opian
    -0.71
    orge
    -0.70
    aukee
    -0.69
    orthy
    -0.65
    ibaba
    -0.65
     Pwr
    -0.64
    POSITIVE LOGITS
    ?]
    0.84
     nobody
    0.80
     everyone
    0.80
     so
    0.80
     people
    0.79
     SO
    0.74
    everyone
    0.73
     liberals
    0.73
     everybody
    0.72
    ?".
    0.71
    Act Density 0.062%

    No Known Activations