INDEX
    Explanations

    claims/arguments

    New Auto-Interp
    Negative Logits
     Maggie
    -0.07
     maintained
    -0.07
     FormData
    -0.07
     guesses
    -0.07
    mach
    -0.07
     TS
    -0.07
     circuits
    -0.07
     گرف
    -0.06
     ts
    -0.06
    Entry
    -0.06
    POSITIVE LOGITS
    ندر
    0.06
    ाइव
    0.06
    xBB
    0.06
    [top
    0.06
     чит
    0.06
    0.06
    ětí
    0.06
    _minor
    0.06
    _original
    0.06
    يكي
    0.06
    Act Density 0.050%

    No Known Activations