INDEX
    Explanations

    conspiracy-related terms and phrases

    New Auto-Interp
    Negative Logits
    ijk
    -0.82
    Thom
    -0.77
    inished
    -0.74
    older
    -0.69
     Chop
    -0.69
    oba
    -0.68
    esa
    -0.66
    TD
    -0.65
    zl
    -0.65
    isha
    -0.64
    POSITIVE LOGITS
     theorist
    1.47
     theorists
    1.45
     theories
    1.34
     theory
    1.06
     theor
    1.02
    ulent
    0.96
     conspir
    0.93
     conspiracy
    0.93
    eering
    0.92
     hatched
    0.88
    Act Density 0.021%

    No Known Activations