INDEX
    Explanations

    punctuation followed by it

    New Auto-Interp
    Negative Logits
     پہلی
    0.66
     गोद
    0.59
     કરવાનો
    0.59
     करण्याचे
    0.59
     फाउंडेशन
    0.59
     एडमिशन
    0.58
    join
    0.58
     सेकेंड
    0.58
     deel
    0.57
    Professor
    0.57
    POSITIVE LOGITS
     Tanto
    0.71
    ermanfaat
    0.69
     شرطونو
    0.69
     Cress
    0.67
     masc
    0.67
    isOpen
    0.64
    ონი
    0.64
     токси
    0.64
    0.63
    ตร
    0.63
    Act Density 0.012%

    No Known Activations