INDEX
    Explanations

    punctuation and transitional phrases indicating ongoing thoughts or actions

    New Auto-Interp
    Negative Logits
    ť
    -0.16
    upal
    -0.15
    foy
    -0.15
    Isl
    -0.15
    vey
    -0.14
    oltip
    -0.14
    olt
    -0.14
    ahan
    -0.13
    âĹĭ
    -0.13
    ÏĨι
    -0.13
    POSITIVE LOGITS
    å¢
    0.17
    zd
    0.14
    ê·ł
    0.14
    emplates
    0.14
     dek
    0.13
    osen
    0.13
    ayout
    0.13
    ãģ¤ãģ¶
    0.13
    ç¥
    0.13
     MyBase
    0.13
    Act Density 0.293%

    No Known Activations