INDEX
    Explanations

    references to authenticity and manipulation in various contexts

    New Auto-Interp
    Negative Logits
    quip
    -0.17
    inen
    -0.16
    loat
    -0.14
     decentral
    -0.14
     tang
    -0.14
    vis
    -0.14
    /bower
    -0.13
    à¤Ĥदर
    -0.13
     loc
    -0.13
    Shown
    -0.13
    POSITIVE LOGITS
     artificial
    0.38
     Artificial
    0.34
     planned
    0.30
     artificially
    0.29
     intentional
    0.29
     deliberate
    0.28
     unnatural
    0.26
     carefully
    0.25
     synthetic
    0.22
     engineered
    0.22
    Act Density 0.293%

    No Known Activations