INDEX
    Explanations

    instances of posting or attribution in written content

    New Auto-Interp
    Negative Logits
     rencont
    -0.14
    stin
    -0.14
    ÙĦÙĪØ¯
    -0.14
    ustum
    -0.14
    ®
    -0.14
    بÙĬÙĨ
    -0.14
    iteli
    -0.14
    icast
    -0.13
     nostalg
    -0.13
    оваÑĤелÑĮ
    -0.13
    POSITIVE LOGITS
    igure
    0.17
    ania
    0.16
    idor
    0.16
     Mang
    0.15
    áo
    0.15
     Weinstein
    0.14
    ilin
    0.14
    eut
    0.14
    uran
    0.14
    _UNUSED
    0.14
    Act Density 0.022%

    No Known Activations