INDEX
    Explanations

    references to deceit and misinformation

    Indicates something is not genuine or real

    New Auto-Interp
    Negative Logits
    wapV
    -0.42
     Pill
    -0.37
    -0.36
    eneuve
    -0.36
     TextAlign
    -0.35
    windowFixed
    -0.35
    StreetMap
    -0.34
     vård
    -0.34
     собі
    -0.33
    OGND
    -0.33
    POSITIVE LOGITS
     fake
    0.91
     faked
    0.87
    fake
    0.84
     phony
    0.80
     faking
    0.79
    Fake
    0.75
     fakes
    0.74
     Fake
    0.73
    FAKE
    0.71
     artificial
    0.69
    Act Density 0.478%

    No Known Activations