INDEX
    Explanations

    various punctuation marks and their associated contexts

    New Auto-Interp
    Negative Logits
    ãģ¾ãģŁ
    -0.16
    .jp
    -0.16
    áh
    -0.14
    ordinate
    -0.14
    Bias
    -0.13
    ãĥ¼ãĥī
    -0.13
    ká
    -0.13
    olt
    -0.13
    :
    -0.13
    ement
    -0.13
    POSITIVE LOGITS
     why
    0.24
     how
    0.23
     an
    0.20
     what
    0.19
    ä¸Ģç§į
    0.18
     a
    0.18
     Part
    0.17
     How
    0.17
     Why
    0.17
     aka
    0.15
    Act Density 0.115%

    No Known Activations