INDEX
    Explanations

    questions starting with "How did" or "Why did"

    questions starting with "How" or "Did."

    New Auto-Interp
    Negative Logits
    houses
    -0.74
    heter
    -0.72
    washer
    -0.71
     Methods
    -0.71
    thur
    -0.70
    rooms
    -0.70
    arters
    -0.69
    room
    -0.68
    atten
    -0.68
    north
    -0.68
    POSITIVE LOGITS
    actic
    1.01
    iosyncr
    0.82
    netflix
    0.77
    nt
    0.68
    IER
    0.68
     originate
    0.68
    ĸļ
    0.67
     not
    0.66
    riks
    0.65
     undergo
    0.65
    Act Density 0.042%

    No Known Activations