INDEX
    Explanations

    references to conspiracy theories and related concepts

    New Auto-Interp
    Negative Logits
     benchmark
    -0.15
    женÑĮ
    -0.15
     Benchmark
    -0.15
    Preview
    -0.15
     preview
    -0.14
     Preview
    -0.14
    preview
    -0.14
    _preview
    -0.14
     Advice
    -0.14
    Benchmark
    -0.13
    POSITIVE LOGITS
     theories
    0.38
     theory
    0.35
     Theory
    0.32
    Theory
    0.31
    theory
    0.30
     conspiracy
    0.29
     THEORY
    0.27
     theorists
    0.27
     teor
    0.26
     hypothesis
    0.26
    Act Density 0.189%

    No Known Activations