INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Heb
    0.41
    حب
    0.40
    هل
    0.39
    ihu
    0.38
    हरी
    0.38
    ahari
    0.38
    Хо
    0.38
    手数
    0.37
    0.37
    хар
    0.37
    POSITIVE LOGITS
     H
    3.70
    H
    2.69
     اله
    2.58
     h
    2.33
    2.27
    2.17
     ه
    1.97
     ہ
    1.91
    1.90
    1.74
    Act Density 0.098%

    No Known Activations