INDEX
    Explanations

    introductions following punctuation

    New Auto-Interp
    Negative Logits
    不上
    0.79
    むしろ
    0.76
    డో
    0.73
    რულ
    0.69
    كنولوج
    0.68
     მხოლოდ
    0.68
    മ്ബ
    0.68
     угодно
    0.66
    리고
    0.66
    ുകൊണ്ടാണ്
    0.65
    POSITIVE LOGITS
     which
    5.53
    which
    5.26
     Which
    4.28
     WHICH
    4.23
    Which
    4.19
     которая
    3.98
     который
    3.94
     которые
    3.73
     który
    3.59
     която
    3.56
    Act Density 0.235%

    No Known Activations