INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ضبط
    -0.08
    _bad
    -0.08
     Fur
    -0.08
    lau
    -0.07
    keen
    -0.07
     Kai
    -0.07
     simp
    -0.07
    ея
    -0.07
    -0.07
     Watching
    -0.07
    POSITIVE LOGITS
    तः
    0.12
    stay
    0.11
     yếu
    0.09
     takeaway
    0.09
     मौ
    0.08
     smokers
    0.08
    ‌ترین
    0.08
    మంత్రి
    0.08
    ವಾಗಿ
    0.08
    amente
    0.08
    Act Density 0.036%

    No Known Activations