INDEX
    Explanations

    prohibited responses that

    New Auto-Interp
    Negative Logits
    जिसे
    0.48
    機構
    0.40
     کجا
    0.40
    যার
    0.40
     এলাম
    0.39
     posiblemente
    0.39
     seguramente
    0.38
    うち
    0.38
    வாள
    0.38
    member
    0.37
    POSITIVE LOGITS
     tế
    0.43
    ته
    0.40
     содержа
    0.38
     Tez
    0.38
     acknowledged
    0.38
    ohia
    0.37
    tgz
    0.37
    рито
    0.37
     zuk
    0.37
    teis
    0.37
    Act Density 0.016%

    No Known Activations