INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _cond
    -0.08
    acent
    -0.08
     constant
    -0.08
    शी
    -0.08
    τει
    -0.07
    -0.07
    constant
    -0.07
    적인
    -0.07
    Nec
    -0.07
     stretched
    -0.07
    POSITIVE LOGITS
     সবাই
    0.13
     juntos
    0.11
     కలిసి
    0.11
    ಿಬ್ಬ
    0.11
     gemeinsam
    0.11
     모두
    0.10
     allemaal
    0.10
     birlikte
    0.10
     бірге
    0.10
    Together
    0.09
    Act Density 0.063%

    No Known Activations