INDEX
    Explanations

    making decisions

    New Auto-Interp
    Negative Logits
    -0.07
     WHICH
    -0.07
     тільки
    -0.07
    which
    -0.07
     specifics
    -0.06
     گو
    -0.06
    され
    -0.06
    otec
    -0.06
    лага
    -0.06
     parte
    -0.06
    POSITIVE LOGITS
    قرار
    0.07
                    ↵                ↵
    0.07
    .goal
    0.07
    ↵		
    ↵
    0.06
    "]))↵
    0.06
    _centers
    0.06
     fallback
    0.06
    tlement
    0.06
    xFD
    0.06
    thood
    0.06
    Act Density 0.381%

    No Known Activations