INDEX
    Explanations

    instances of the word "Ad" which likely indicates advertisements or references to promotional content

    New Auto-Interp
    Negative Logits
     ÑŁ
    -0.14
    iegel
    -0.14
    itzer
    -0.14
    EDIA
    -0.14
    彦
    -0.14
    för
    -0.13
    gli
    -0.13
     Serif
    -0.13
    bilir
    -0.13
    enstein
    -0.13
    POSITIVE LOGITS
    ri
    0.16
    uce
    0.16
     wings
    0.16
     aw
    0.14
    /remove
    0.14
    rians
    0.14
    ilent
    0.13
    γη
    0.13
    ity
    0.13
    resco
    0.13
    Act Density 0.035%

    No Known Activations