INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Fer
    -0.08
     nests
    -0.08
    .kind
    -0.08
    HF
    -0.08
     Nest
    -0.08
     nest
    -0.08
     Cari
    -0.08
     Kind
    -0.08
     playful
    -0.07
    Kind
    -0.07
    POSITIVE LOGITS
     etiquette
    0.09
     النجاح
    0.08
     bals
    0.08
     успех
    0.08
     decl
    0.08
     অনুষ্ঠ
    0.08
     smo
    0.08
     मुकाब
    0.08
    zwa
    0.08
     विवाह
    0.07
    Act Density 0.003%

    No Known Activations