INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    يان
    0.38
    dominal
    0.37
    0.37
     अकबर
    0.36
    elling
    0.36
    kam
    0.35
    thern
    0.35
    niki
    0.35
     zeal
    0.35
     typ
    0.35
    POSITIVE LOGITS
     Mih
    0.50
    ссмо
    0.44
     nih
    0.44
     Mij
    0.41
     Nih
    0.40
     нейтро
    0.40
     mih
    0.39
    чів
    0.39
    ilation
    0.38
     affirmat
    0.37
    Act Density 0.001%

    No Known Activations