INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fty
    -0.07
    Ext
    -0.06
    rch
    -0.06
    quad
    -0.06
     numbered
    -0.06
     Premiership
    -0.06
    και
    -0.06
    jured
    -0.06
    Lou
    -0.06
     scorer
    -0.06
    POSITIVE LOGITS
    idd
    0.07
     Meals
    0.07
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    0.07
    (Database
    0.07
    .then
    0.06
     Unary
    0.06
     schn
    0.06
     اللغة
    0.06
     özellik
    0.06
    .matmul
    0.06
    Act Density 0.078%

    No Known Activations