INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     accepted
    -0.07
    model
    -0.06
     VECTOR
    -0.06
    zd
    -0.06
     vanish
    -0.06
    combine
    -0.06
     وق
    -0.06
     probing
    -0.06
    uffles
    -0.06
    ひと
    -0.06
    POSITIVE LOGITS
    .nil
    0.07
     Levi
    0.07
    		    	
    0.06
     Dining
    0.06
    _coupon
    0.06
     आप
    0.06
    Islam
    0.06
    (rowIndex
    0.06
     tempList
    0.06
    .Max
    0.06
    Act Density 0.018%

    No Known Activations