INDEX
    Explanations

    text structuring and formatting cues

    colons and their associated textual contexts

    New Auto-Interp
    Negative Logits
     adversaries
    -0.75
    æ©
    -0.73
     pestic
    -0.71
    ¥ŀ
    -0.71
    userc
    -0.70
    senal
    -0.69
     breakthrough
    -0.67
    phabet
    -0.67
     ingred
    -0.66
     hemor
    -0.66
    POSITIVE LOGITS
     âĨij
    0.90
     Interesting
    0.88
     Originally
    0.85
    Show
    0.85
    Originally
    0.73
     Wow
    0.72
     Hmm
    0.72
     Assuming
    0.72
    Nice
    0.68
     Surely
    0.68
    Act Density 0.073%

    No Known Activations