INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     wyp
    -0.08
    يز
    -0.08
     conc
    -0.07
     humili
    -0.07
     dys
    -0.07
     diet
    -0.07
     ambiente
    -0.07
     cutting
    -0.07
     segundos
    -0.07
     دی
    -0.07
    POSITIVE LOGITS
    -average
    0.10
     район
    0.08
    Coords
    0.08
     eftersom
    0.08
     averaged
    0.08
    :center
    0.08
    -centered
    0.08
    itudes
    0.08
    Gaussian
    0.08
    .ber
    0.08
    Act Density 0.028%

    No Known Activations