INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    öst
    -0.09
     أه
    -0.09
     ఉద్యోగ
    -0.08
    ulich
    -0.08
    .expression
    -0.08
     Schau
    -0.08
    ahl
    -0.08
     شر
    -0.08
     Qualifications
    -0.08
     geeignet
    -0.07
    POSITIVE LOGITS
     gemeinsame
    0.11
    common
    0.11
     common
    0.10
    	common
    0.10
     shared
    0.10
    Shared
    0.10
    Common
    0.09
    (common
    0.09
    shared
    0.09
     gemeinsamen
    0.09
    Act Density 0.024%

    No Known Activations