INDEX
    Explanations

    mentions of authorship and attribution in text

    New Auto-Interp
    Negative Logits
    otts
    -0.17
    ź
    -0.16
    alin
    -0.15
    uren
    -0.14
    ailing
    -0.14
     l
    -0.14
    icz
    -0.14
    imson
    -0.14
    ze
    -0.14
    _FACTORY
    -0.14
    POSITIVE LOGITS
    erm
    0.16
    yne
    0.15
    ikip
    0.15
    dda
    0.14
     undermin
    0.14
    meldung
    0.14
    shima
    0.14
    ãĤ¦ãĤ£
    0.14
    é¨
    0.14
    .fhir
    0.14
    Act Density 0.037%

    No Known Activations