INDEX
    Explanations

    references to fictional characters and creative works

    New Auto-Interp
    Negative Logits
    upal
    -0.18
     paddle
    -0.17
     padd
    -0.16
    .uf
    -0.15
     Flake
    -0.14
     Gün
    -0.14
    ubby
    -0.14
    tn
    -0.14
     tank
    -0.14
     Shut
    -0.14
    POSITIVE LOGITS
     Witch
    0.24
     Ger
    0.19
     witch
    0.18
    Ger
    0.18
    aska
    0.17
     Sabb
    0.17
    witch
    0.17
     Netflix
    0.16
     Polish
    0.16
    麻
    0.16
    Act Density 0.007%

    No Known Activations