INDEX
    Explanations

    patterns of citation and reference formatting

    New Auto-Interp
    Negative Logits
    ÑĢок
    -0.17
    SF
    -0.16
    ystal
    -0.16
    unks
    -0.16
    etsk
    -0.16
    JV
    -0.15
    izzo
    -0.14
    xcf
    -0.14
    oin
    -0.14
    ington
    -0.14
    POSITIVE LOGITS
    ansen
    0.21
    ONES
    0.21
    bara
    0.19
    eline
    0.19
    ansson
    0.19
    affe
    0.18
    olly
    0.18
    agers
    0.17
     ones
    0.17
    aks
    0.17
    Act Density 0.023%

    No Known Activations