INDEX
    Explanations

    questions starting with "How" or "What" indicating inquiry or seeking information

    New Auto-Interp
    Negative Logits
    /how
    -0.16
     how
    -0.15
    jev
    -0.14
    .amazonaws
    -0.14
    ä½ķ
    -0.14
    s
    -0.14
    stru
    -0.14
     why
    -0.14
     x
    -0.13
     Tent
    -0.13
    POSITIVE LOGITS
     does
    0.29
     Does
    0.29
     do
    0.26
     did
    0.24
    does
    0.24
    Does
    0.23
     Do
    0.22
     Are
    0.20
     should
    0.19
     Should
    0.19
    Act Density 0.042%

    No Known Activations