INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     또한
    -0.10
    -0.08
     liệu
    -0.08
     autonom
    -0.08
    .IT
    -0.07
     এছ
    -0.07
     cependant
    -0.07
    chini
    -0.07
     mtu
    -0.07
    inho
    -0.07
    POSITIVE LOGITS
     willen
    0.12
     want
    0.12
     interested
    0.12
     चाहते
    0.11
     চান
    0.11
     quieres
    0.11
     ترغب
    0.11
     wish
    0.11
     wanted
    0.10
     wants
    0.10
    Act Density 0.057%

    No Known Activations