INDEX
    Explanations

    numerical values or identifiers within the text

    New Auto-Interp
    Negative Logits
    kili
    -0.16
    newInstance
    -0.16
    lier
    -0.16
    omik
    -0.15
    ilm
    -0.15
    rsa
    -0.14
    afort
    -0.14
     bew
    -0.14
     گر
    -0.14
    imli
    -0.14
    POSITIVE LOGITS
    fried
    0.15
     undisclosed
    0.15
    ritz
    0.15
     Fritz
    0.15
     chan
    0.15
    agle
    0.14
    .fake
    0.14
    etical
    0.14
    th
    0.14
     Weiss
    0.13
    Act Density 0.214%

    No Known Activations