INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     españoles
    -0.08
     españolas
    -0.07
    .mask
    -0.07
     offending
    -0.07
     muß
    -0.07
     Osman
    -0.07
     taxable
    -0.07
     imports
    -0.07
     friv
    -0.07
     slightly
    -0.07
    POSITIVE LOGITS
    0.12
     acompañ
    0.11
     পাশে
    0.11
     companionship
    0.11
     accompaniment
    0.10
     accompagne
    0.10
     cheering
    0.09
     jederzeit
    0.09
     dispuesto
    0.09
     भरो
    0.09
    Act Density 0.034%

    No Known Activations