INDEX
    Explanations

    mentions of the word "Swift," particularly in reference to the Taylor Swift music or brand

    New Auto-Interp
    Negative Logits
    voie
    -0.15
     grues
    -0.15
    šk
    -0.14
    PF
    -0.14
     Dennis
    -0.14
    /=
    -0.14
    ysa
    -0.13
    Truthy
    -0.13
     Tro
    -0.13
    ogie
    -0.13
    POSITIVE LOGITS
    s
    0.15
    ened
    0.15
    filer
    0.15
    arna
    0.14
     à¤ķथ
    0.14
    ë§IJ
    0.14
    못
    0.14
    608
    0.14
    Ctrls
    0.14
     ext
    0.14
    Act Density 0.002%

    No Known Activations