INDEX
    Explanations

    expressions relating to identity and authenticity

    New Auto-Interp
    Negative Logits
    bjerg
    -0.17
    569
    -0.15
    opic
    -0.14
    ogg
    -0.14
    uyu
    -0.14
    é»
    -0.13
    ucs
    -0.13
    createQuery
    -0.13
    veloper
    -0.13
    RenderTarget
    -0.13
    POSITIVE LOGITS
     fake
    0.40
    fake
    0.37
     Fake
    0.37
    Fake
    0.36
     faker
    0.32
     pretending
    0.32
     mask
    0.32
     masks
    0.30
     artificial
    0.30
     pretend
    0.29
    Act Density 0.031%

    No Known Activations