INDEX
    Explanations

    references to authorship or attribution in text

    New Auto-Interp
    Negative Logits
    otland
    -0.17
    andy
    -0.16
    ols
    -0.16
    nak
    -0.16
    909
    -0.15
    alam
    -0.15
    æĿ¾
    -0.14
     Wich
    -0.14
    ведиÑĤе
    -0.14
    ulum
    -0.14
    POSITIVE LOGITS
     Trend
    0.15
    Capability
    0.15
    aju
    0.15
     âĩ
    0.14
    amespace
    0.14
     trend
    0.14
     ourselves
    0.14
    uada
    0.14
    azu
    0.14
     int
    0.14
    Act Density 0.010%

    No Known Activations