INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    y
    0.81
    t
    0.70
    try
    0.66
    a
    0.63
    0.63
    de
    0.61
    ta
    0.61
    tg
    0.59
    tre
    0.58
    dozen
    0.58
    POSITIVE LOGITS
    0.81
    +}$
    0.65
     braiding
    0.61
     FYI
    0.60
     rheumatoid
    0.59
     pretzels
    0.58
     blew
    0.58
     refugees
    0.58
     bombers
    0.57
     scolded
    0.57
    Act Density 0.087%

    No Known Activations