INDEX
    Explanations

    references to individuals, particularly those with honorifics or titles

    New Auto-Interp
    Negative Logits
    umper
    -0.15
    ãĥ¼ãĥį
    -0.15
    ɵ
    -0.15
    tas
    -0.15
    obble
    -0.15
    orris
    -0.15
    елеÑĦ
    -0.14
    asca
    -0.14
    utenberg
    -0.14
    oodoo
    -0.14
    POSITIVE LOGITS
    163
    0.15
     redraw
    0.15
    anna
    0.15
    901
    0.14
    645
    0.14
    247
    0.14
    empt
    0.13
     Authors
    0.13
     Laure
    0.13
    riot
    0.13
    Act Density 0.035%

    No Known Activations