INDEX
    Explanations

    references to conspiracy theories and government covert operations

    New Auto-Interp
    Negative Logits
    ofire
    -0.14
    ains
    -0.14
    elry
    -0.14
     erupt
    -0.14
     Inspection
    -0.13
    ifest
    -0.13
    ga
    -0.13
    Inspect
    -0.12
     Sherlock
    -0.12
    ÃŃd
    -0.12
    POSITIVE LOGITS
     secret
    0.41
     Secret
    0.33
    -secret
    0.31
     SECRET
    0.30
     secrets
    0.30
     classified
    0.29
    secret
    0.29
    Secret
    0.29
    ç§ĺ
    0.29
     secretly
    0.28
    Act Density 0.051%

    No Known Activations