INDEX
    Explanations

    Punctuation

    New Auto-Interp
    Negative Logits
     gew
    -0.08
     pew
    -0.07
     brag
    -0.07
     '.'
    -0.07
     statement
    -0.07
     Glasses
    -0.06
     HOW
    -0.06
    .false
    -0.06
    lea
    -0.06
     Eve
    -0.06
    POSITIVE LOGITS
    tür
    0.07
    .,↵
    0.07
    ,url
    0.07
    ิ่
    0.06
    },↵
    0.06
    ',↵↵
    0.06
    ↵	
    ↵
    0.06
    (),↵
    0.06
    (",")↵
    0.06
     merkezi
    0.06
    Act Density 0.134%

    No Known Activations