INDEX
    Explanations

    instances of the word "explain" and its variations, indicating a focus on descriptions or clarifications

    New Auto-Interp
    Negative Logits
    readcr
    -0.20
    achi
    -0.16
    chef
    -0.15
    dit
    -0.15
    las
    -0.15
    ÌĢ
    -0.14
    ÑģÑİ
    -0.14
    há
    -0.14
    inally
    -0.14
    缮
    -0.14
    POSITIVE LOGITS
     why
    0.22
    why
    0.17
    为ä»Ģä¹Ī
    0.16
    oad
    0.16
    ì°¨
    0.14
    ĩ
    0.14
    OFFSET
    0.14
    artner
    0.14
    ovnÄĽ
    0.14
     offs
    0.14
    Act Density 0.041%

    No Known Activations