INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ինգ
    -0.08
     инструк
    -0.08
     കോൺ
    -0.08
    ʻiga
    -0.08
    -iṣẹ
    -0.08
    ოგ
    -0.08
     اقدامات
    -0.08
    инга
    -0.08
    ‌ന
    -0.07
    ინგ
    -0.07
    POSITIVE LOGITS
    0.09
     remb
    0.08
    0.08
    0.08
     madr
    0.07
     enterr
    0.07
    0.07
     liberté
    0.07
    .step
    0.07
    yyy
    0.07
    Act Density 0.019%

    No Known Activations