INDEX
    Explanations

    questions starting with the word "What"

    instances of the word "What" and its variations, particularly in questions

    New Auto-Interp
    Negative Logits
    interstitial
    -0.73
    768
    -0.62
    Ń·
    -0.62
    Bey
    -0.62
    Discover
    -0.58
    atis
    -0.56
    recy
    -0.55
    ffe
    -0.55
    ean
    -0.55
    uchs
    -0.54
    POSITIVE LOGITS
    soever
    1.43
     happened
    1.15
     happens
    1.11
    ?!
    1.05
     else
    1.01
     happ
    1.00
    ?!"
    0.97
    !?
    0.95
    !?"
    0.88
     bothers
    0.83
    Act Density 0.077%

    No Known Activations