INDEX
    Explanations

    specific symbols or unusual characters in the text

    New Auto-Interp
    Negative Logits
     â̝
    -0.18
     COVID
    -0.17
     ðŁĶ
    -0.16
    âĢį
    -0.16
    â̝
    -0.15
    â
    -0.15
    COVID
    -0.15
    ðŁ
    -0.15
    abbix
    -0.15
    ï¸ı
    -0.15
    POSITIVE LOGITS
     fucking
    0.34
     fuck
    0.29
     fucks
    0.29
     sod
    0.28
     fucked
    0.28
     Fucking
    0.28
     shit
    0.27
     FUCK
    0.27
     Fuck
    0.27
     Fucked
    0.26
    Act Density 0.029%

    No Known Activations