INDEX
    Explanations

    questions starting with the word "What"

    questions beginning with "What."

    New Auto-Interp
    Negative Logits
    ped
    -0.67
    por
    -0.65
    roads
    -0.64
    eer
    -0.63
    rod
    -0.62
    lot
    -0.62
    ulic
    -0.61
    gal
    -0.61
    Gy
    -0.61
    uttering
    -0.60
    POSITIVE LOGITS
    soever
    1.29
     happens
    1.11
     happened
    1.03
     distinguishes
    0.94
     transpired
    0.94
     happ
    0.92
     kinds
    0.87
     Lies
    0.85
     sorts
    0.84
     else
    0.84
    Act Density 0.092%

    No Known Activations