INDEX
    Explanations

    references to social media companies and their influence on free speech

    New Auto-Interp
    Negative Logits
    afort
    -0.18
    برÛĮ
    -0.16
    oomla
    -0.15
     Millet
    -0.15
    äºľ
    -0.14
    amilia
    -0.14
    roit
    -0.14
     pyramid
    -0.14
     pul
    -0.14
    ounge
    -0.14
    POSITIVE LOGITS
    BuilderInterface
    0.16
    antic
    0.15
    öl
    0.15
    -inline
    0.15
    su
    0.14
    uest
    0.14
    Ìĥ
    0.14
     ÑĥÑģл
    0.14
    éo
    0.13
    DAC
    0.13
    Act Density 0.035%

    No Known Activations