INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Revised
    -0.07
    Neg
    -0.07
     revised
    -0.06
    Radius
    -0.06
     Davis
    -0.06
    _today
    -0.06
    Marvel
    -0.06
    REL
    -0.06
    -Length
    -0.06
    neg
    -0.06
    POSITIVE LOGITS
    	Block
    0.07
     grö
    0.07
     اتفاق
    0.06
    0.06
    特色
    0.06
     kaz
    0.06
     kez
    0.06
    0.06
     stockings
    0.06
     Εθν
    0.06
    Act Density 0.003%

    No Known Activations