INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     muß
    -0.86
    ・・・・・
    -0.85
     läßt
    -0.82
    ・・・・
    -0.81
     müßte
    -0.69
    -0.66
    -0.65
     .......
    -0.65
     unsurpassed
    -0.65
     mußte
    -0.64
    POSITIVE LOGITS
     Idk
    1.08
     shitty
    1.05
     idk
    1.05
     fucked
    1.03
     fucking
    1.03
    idk
    1.02
     tbh
    1.02
     goddamn
    0.99
     FUCKING
    0.98
     weirdly
    0.98
    Act Density 0.312%

    No Known Activations