INDEX
    Explanations

    questions starting with "Why."

    questions and phrases that express uncertainty or curiosity about reasons and explanations

    New Auto-Interp
    Negative Logits
     ILCS
    -0.80
     icing
    -0.67
    OLOGY
    -0.62
    ibus
    -0.61
     kilometres
    -0.60
    combe
    -0.58
    ylan
    -0.58
    achus
    -0.57
    \\\\\\\\\\\\\\\\
    -0.57
    atellite
    -0.57
    POSITIVE LOGITS
    ãĢij
    0.71
    ppo
    0.67
    ]).
    0.66
     Matters
    0.66
    so
    0.65
    ãĤ»
    0.65
    ug
    0.65
    ]),
    0.64
     Hebdo
    0.63
     differently
    0.63
    Act Density 0.270%

    No Known Activations