INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _seat
    -0.07
    ewidth
    -0.06
     en
    -0.06
    -0.06
    UID
    -0.06
    ительных
    -0.06
    -0.06
     میدان
    -0.06
    ộn
    -0.06
     수도
    -0.06
    POSITIVE LOGITS
     fores
    0.07
     част
    0.07
    	params
    0.06
    Intermediate
    0.06
    $results
    0.06
    اشی
    0.06
     afflicted
    0.06
     commenting
    0.06
     deceived
    0.06
    	s
    0.06
    Act Density 0.003%

    No Known Activations