INDEX
    Explanations

    questions and interrogative phrases

    New Auto-Interp
    Negative Logits
    ₁.
    0.84
     .,
    0.80
     。,
    0.78
    '.
    0.77
    .*;
    0.77
    }.
    0.77
    }^{+}$.
    0.76
    ].
    0.73
    。.
    0.73
    .[
    0.73
    POSITIVE LOGITS
    ?
    4.73
    4.32
    ؟
    4.25
    ?"
    4.04
    ?)
    3.88
    ?”
    3.86
    ?</
    3.79
    ?\
    3.75
    ?'
    3.67
    3.65
    Act Density 2.282%

    No Known Activations