INDEX
    Explanations

    colons and specific structured formatting indicators in code or documentation

    New Auto-Interp
    Negative Logits
    ä»ĭ
    -0.15
    osta
    -0.14
    notated
    -0.14
     Shak
    -0.14
    olland
    -0.14
     Gig
    -0.14
    vore
    -0.14
    oy
    -0.14
    phet
    -0.13
    ková
    -0.13
    POSITIVE LOGITS
     STYLE
    0.16
     transitions
    0.15
     outlaw
    0.15
    izza
    0.14
    HECK
    0.14
    Ь
    0.14
    elm
    0.14
     Styles
    0.14
    dojo
    0.14
     style
    0.13
    Act Density 0.002%

    No Known Activations