INDEX
    Explanations

    following words and phrases

    New Auto-Interp
    Negative Logits
     Каждый
    0.42
     Rowling
    0.41
     मोटरसाइकिल
    0.41
     raped
    0.41
    করিয়া
    0.40
     ग्ला
    0.40
    тбол
    0.39
     Estadística
    0.39
     इसमें
    0.38
    वासी
    0.38
    POSITIVE LOGITS
     magyar
    0.45
    om
    0.45
    mag
    0.43
    init
    0.43
     juh
    0.42
    gu
    0.42
     gu
    0.41
    att
    0.41
    gram
    0.40
    aux
    0.40
    Act Density 0.000%

    No Known Activations