INDEX
    Explanations

    questions starting with "How" and their variants

    New Auto-Interp
    Negative Logits
    whether
    -0.14
    lier
    -0.14
    WA
    -0.14
    ovable
    -0.14
     Whether
    -0.14
    ÏĢι
    -0.13
    aneously
    -0.13
    elier
    -0.13
    406
    -0.13
     proh
    -0.13
    POSITIVE LOGITS
     did
    0.35
     does
    0.29
     do
    0.29
    did
    0.27
     Did
    0.22
    Did
    0.22
     long
    0.21
    .did
    0.21
    )did
    0.21
     old
    0.20
    Act Density 0.036%

    No Known Activations