INDEX
    Explanations

    questions starting with "why."

    New Auto-Interp
    Negative Logits
    ffa
    -0.16
    onse
    -0.15
    amework
    -0.15
    leine
    -0.14
    sters
    -0.14
    UNET
    -0.14
    ners
    -0.14
    ivate
    -0.14
    aurus
    -0.14
    wins
    -0.13
    POSITIVE LOGITS
    ever
    0.23
    alla
    0.19
     do
    0.19
     did
    0.18
     does
    0.17
    te
    0.17
     bother
    0.16
     else
    0.16
     waste
    0.16
     not
    0.16
    Act Density 0.023%

    No Known Activations