INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Parr
    -0.07
     Luft
    -0.07
     ascent
    -0.06
    δα
    -0.06
    etzt
    -0.06
     testify
    -0.06
     aşağı
    -0.06
     strt
    -0.06
    _extractor
    -0.06
     punched
    -0.06
    POSITIVE LOGITS
    Spec
    0.09
    	spec
    0.07
     lessen
    0.07
     Premium
    0.07
    spec
    0.07
     Bool
    0.06
    needed
    0.06
    0.06
     Suns
    0.06
    [].
    0.06
    Act Density 0.001%

    No Known Activations