INDEX
    Explanations

    questions or prompts starting with the word "What"

    questions beginning with the word "What."

    New Auto-Interp
    Negative Logits
    apsed
    -0.74
    swick
    -0.66
    emp
    -0.64
    hew
    -0.64
    pic
    -0.63
    iva
    -0.61
    Lago
    -0.61
    ammed
    -0.61
    rolley
    -0.59
    abel
    -0.59
    POSITIVE LOGITS
     do
    0.94
     does
    0.94
     kinds
    0.91
     distinguishes
    0.89
     determines
    0.88
     happens
    0.87
     motiv
    0.84
     qualifies
    0.82
     are
    0.81
     did
    0.81
    Act Density 0.054%

    No Known Activations