INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ജീ
    2.03
     কর্ম
    2.02
    দ্র
    2.00
    ชาติ
    1.95
     gezien
    1.89
    कृपया
    1.88
     Đ
    1.88
     abo
    1.86
    ʒ
    1.85
     HomeComponent
    1.84
    POSITIVE LOGITS
    esque
    2.99
    ل
    2.89
    l
    2.88
    ми
    2.62
    in
    2.50
    stown
    2.49
    لک
    2.41
    ן
    2.33
     dotyczą
    2.32
    larni
    2.28
    Act Density 0.073%

    No Known Activations