INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .CG
    -0.06
     spacing
    -0.06
         
    -0.06
     absorbs
    -0.06
     hypocrisy
    -0.06
     bears
    -0.06
     Larger
    -0.06
     stands
    -0.06
    ाइव
    -0.06
     bearings
    -0.06
    POSITIVE LOGITS
    la
    0.09
    λα
    0.08
    ela
    0.08
    .Tool
    0.08
    ubl
    0.07
    oola
    0.07
    ULA
    0.07
    0.07
    ilda
    0.07
    kal
    0.07
    Act Density 0.015%

    No Known Activations