INDEX
    Explanations

    instances of parentheses and related formatting in the text

    New Auto-Interp
    Negative Logits
    auc
    -0.15
    à¸ģà¸ķ
    -0.15
    uke
    -0.14
    itou
    -0.14
     Kür
    -0.14
    ãģĵãģ¨ãģ¯
    -0.14
    itler
    -0.14
    ãĢij,
    -0.14
    ä¸ĢåĮº
    -0.14
    conde
    -0.13
    POSITIVE LOGITS
    ISC
    0.20
     semi
    0.17
     gas
    0.16
    wo
    0.16
    afa
    0.15
    æłª
    0.15
    brace
    0.14
     lowercase
    0.14
     insert
    0.14
    _Insert
    0.14
    Act Density 0.052%

    No Known Activations