INDEX
    Explanations

    questions and expressions of uncertainty or confusion

    Followed by question words

    New Auto-Interp
    Negative Logits
    bool
    -0.49
     bool
    -0.42
    them
    -0.40
    quedas
    -0.35
    more
    -0.35
    ś
    -0.35
    Cl
    -0.34
    遠慮
    -0.33
    EXPERIMENTAL
    -0.33
    []"
    -0.33
    POSITIVE LOGITS
     how
    1.93
     How
    1.41
    How
    1.34
     HOW
    1.28
    how
    1.25
     what
    1.18
    HOW
    1.03
     cómo
    1.03
    ώς
    1.02
    Как
    1.00
    Act Density 0.352%

    No Known Activations