INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    strip
    -0.07
     gd
    -0.06
     HAS
    -0.06
    olic
    -0.06
    oidal
    -0.06
     crossings
    -0.06
     further
    -0.06
    _circle
    -0.06
    deque
    -0.06
    indic
    -0.06
    POSITIVE LOGITS
    0.07
    (TokenType
    0.07
    0.07
     acquainted
    0.07
     errores
    0.06
     nicotine
    0.06
    	HAL
    0.06
     له
    0.06
    (Art
    0.06
    ανά
    0.06
    Act Density 0.001%

    No Known Activations