INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     splurge
    0.41
     Sikhs
    0.40
     Railways
    0.40
     syndicated
    0.40
     weeding
    0.39
     Shia
    0.39
     ThemeOverlay
    0.39
    🐀
    0.39
    🕙
    0.39
     flyers
    0.39
    POSITIVE LOGITS
    ensuremath
    0.93
    mathcal
    0.87
    mathbb
    0.86
    mathbf
    0.83
    mbox
    0.81
    text
    0.75
    operatorname
    0.72
     {\
    0.71
    left
    0.67
    mathrm
    0.65
    Act Density 0.001%

    No Known Activations