INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     věku
    -0.07
    Mvc
    -0.07
    movie
    -0.07
    med
    -0.06
     sıras
    -0.06
     ż
    -0.06
    ashed
    -0.06
     Soldier
    -0.06
     Wimbledon
    -0.06
    place
    -0.06
    POSITIVE LOGITS
    0.07
     المس
    0.06
     unsure
    0.06
     Noise
    0.06
    vailability
    0.06
    ΕΡ
    0.06
     adopted
    0.06
     Adopt
    0.06
     vice
    0.06
     foster
    0.06
    Act Density 0.002%

    No Known Activations