INDEX
    Explanations

    tokens or characters with high frequency or specific formatting, potentially indicative of programming or code elements

    New Auto-Interp
    Negative Logits
    .stamp
    -0.17
    errat
    -0.17
    vetica
    -0.16
    olini
    -0.16
    ivent
    -0.14
     sab
    -0.14
    ream
    -0.13
    Sab
    -0.13
    466
    -0.13
    steder
    -0.13
    POSITIVE LOGITS
     point
    0.25
     Point
    0.25
    che
    0.22
    point
    0.22
    -point
    0.21
    Point
    0.21
     position
    0.20
     pop
    0.20
    é»ŀ
    0.19
     che
    0.19
    Act Density 0.007%

    No Known Activations