INDEX
    Explanations

    descriptive words following common words

    New Auto-Interp
    Negative Logits
     میرے
    1.56
    1.54
    들의
    1.47
     اپنے
    1.44
     그의
    1.40
     자신의
    1.40
     다른
    1.39
    시키는
    1.36
    กับ
    1.36
     ہمارے
    1.35
    POSITIVE LOGITS
    amenti
    0.91
    ettu
    0.90
    uvu
    0.87
     बात
    0.83
    0.83
    noff
    0.82
    assertions
    0.80
    westen
    0.80
    0.80
    ;",
    0.78
    Act Density 0.243%

    No Known Activations