INDEX
    Explanations

    references to conspiracy theories and their proponents

    New Auto-Interp
    Negative Logits
    bourg
    -0.88
    pex
    -0.84
    emouth
    -0.81
    atal
    -0.76
    amen
    -0.73
    emale
    -0.72
    furt
    -0.71
    artney
    -0.70
    Delivery
    -0.69
    ophon
    -0.68
    POSITIVE LOGITS
     theories
    1.14
     theorists
    0.92
     theorist
    0.91
     debunk
    0.90
     concoct
    0.86
     conspiracy
    0.84
     theor
    0.84
     abound
    0.81
     explanations
    0.81
     twist
    0.80
    Act Density 0.021%

    No Known Activations