INDEX
    Explanations

    abbreviations and initialisms

    New Auto-Interp
    Negative Logits
    一定的
    0.46
    يا
    0.45
    ت
    0.44
    其他
    0.42
    يع
    0.42
    дел
    0.41
    غير
    0.40
    Ngoài
    0.40
    0.39
    也是
    0.38
    POSITIVE LOGITS
     for
    0.59
    ur
    0.49
     lesions
    0.49
     hewan
    0.45
     troops
    0.45
     beverages
    0.44
     인한
    0.44
     pháp
    0.44
     নিষ্পত্তি
    0.44
    O
    0.43
    Act Density 0.078%

    No Known Activations