INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     travelled
    -0.06
     Beled
    -0.06
    سلام
    -0.06
     iPhone
    -0.06
     orderby
    -0.05
     "../../../../
    -0.05
    ,,,,
    -0.05
     zorunlu
    -0.05
    алися
    -0.05
     있던
    -0.05
    POSITIVE LOGITS
     MQ
    0.07
     اصل
    0.07
     Bunun
    0.06
     embarrass
    0.06
     Dost
    0.06
    -sum
    0.06
     cải
    0.06
     нас
    0.06
    DonaldTrump
    0.06
    ½
    0.06
    Act Density 0.001%

    No Known Activations