INDEX
    Explanations

    mentions of news sources and social media links

    New Auto-Interp
    Negative Logits
    ÙĦس
    -0.17
    io
    -0.16
    hd
    -0.15
    abling
    -0.15
     meme
    -0.15
    -Smith
    -0.15
    ÏģÏį
    -0.14
    hydr
    -0.14
    gap
    -0.14
    268
    -0.14
    POSITIVE LOGITS
    isas
    0.20
     Arg
    0.16
    Mirror
    0.16
     bosses
    0.16
     Crime
    0.16
     boss
    0.15
    abcdefghijklmnop
    0.15
    Crime
    0.15
     Mirror
    0.15
     crime
    0.15
    Act Density 0.027%

    No Known Activations