INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     principals
    -0.09
    princip
    -0.08
    Princip
    -0.08
     Johan
    -0.08
     philosophy
    -0.08
     Princip
    -0.08
     filoz
    -0.08
     Cyril
    -0.07
     Hugo
    -0.07
     Ped
    -0.07
    POSITIVE LOGITS
     کردن
    0.10
     งาน
    0.09
    gelegen
    0.09
     แม
    0.08
     carpets
    0.08
     uusi
    0.07
     bilden
    0.07
     într
    0.07
     spreadsheets
    0.07
    ghar
    0.07
    Act Density 0.003%

    No Known Activations