INDEX
    Explanations

    abstracted concepts concluding phrases

    New Auto-Interp
    Negative Logits
    اسی
    0.38
    దయ
    0.37
     函数
    0.34
     Carpathian
    0.34
    therapy
    0.34
     الشعر
    0.34
     problemler
    0.34
    وعة
    0.34
    }}^{*
    0.34
    *:
    0.33
    POSITIVE LOGITS
     Ours
    0.39
     .
    0.38
     ہے۔
    0.37
     READ
    0.37
    hende
    0.37
    BUGFS
    0.37
     સામાન્ય
    0.37
    emis
    0.36
    r
    0.36
     ause
    0.36
    Act Density 0.040%

    No Known Activations