INDEX
    Explanations

    authorship or attribution in text

    New Auto-Interp
    Negative Logits
    exus
    -0.19
    å¿Ĺ
    -0.15
    umer
    -0.14
    _coeffs
    -0.14
    ft
    -0.14
    aho
    -0.14
    ê°IJ
    -0.14
    undi
    -0.14
    umber
    -0.14
    ushima
    -0.14
    POSITIVE LOGITS
    rette
    0.14
    born
    0.14
    боÑĢ
    0.14
    infeld
    0.14
    aj
    0.14
     glor
    0.13
    298
    0.13
     Born
    0.13
    -products
    0.13
    обÑĢаз
    0.12
    Act Density 0.052%

    No Known Activations