INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Based
    -0.06
    ~-
    -0.06
     rozs
    -0.06
     влас
    -0.06
     نگهداری
    -0.06
     touchdowns
    -0.06
     moisture
    -0.05
    чна
    -0.05
    	h
    -0.05
    -0.05
    POSITIVE LOGITS
     hayat
    0.07
    IR
    0.07
     designated
    0.07
     messages
    0.07
     погод
    0.07
     mistaken
    0.06
     ls
    0.06
     propre
    0.06
     sending
    0.06
    غير
    0.06
    Act Density 0.026%

    No Known Activations