INDEX
    Explanations

    questions starting with "How does" and "What does."

    questions beginning with "how" or "what does."

    New Auto-Interp
    Negative Logits
    ascript
    -0.84
    devices
    -0.76
    fights
    -0.73
    ãģ«
    -0.73
    offs
    -0.70
    fter
    -0.69
    zzo
    -0.69
    artifacts
    -0.69
    ases
    -0.69
    legram
    -0.68
    POSITIVE LOGITS
     anybody
    1.00
     anyone
    0.99
    olation
    0.84
     this
    0.74
    n
    0.71
     it
    0.68
     ANY
    0.67
    olated
    0.65
     anything
    0.64
    olate
    0.64
    Act Density 0.041%

    No Known Activations