INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ppt
    -0.69
    telen
    -0.69
     phys
    -0.68
     pass
    -0.67
     Wassers
    -0.66
     Wiss
    -0.65
    couch
    -0.65
    -0.65
    quen
    -0.65
    glass
    -0.65
    POSITIVE LOGITS
     waffle
    1.79
     waffles
    1.78
    🧇
    1.66
     Waffle
    1.36
     Waff
    1.29
     irons
    1.23
    waffle
    1.09
    Waffle
    1.04
     Belgian
    1.03
     Brussels
    1.02
    Act Density 0.013%

    No Known Activations