INDEX
    Explanations

    punctuations and citations in academic writing

    New Auto-Interp
    Negative Logits
    orro
    -0.15
    ozem
    -0.15
    wend
    -0.14
    elho
    -0.14
    alen
    -0.14
    rish
    -0.14
    άβ
    -0.14
    Harness
    -0.14
    éro
    -0.14
    asha
    -0.14
    POSITIVE LOGITS
     batchSize
    0.14
    boro
    0.14
    993
    0.14
    ÙĦاÙĦ
    0.14
    itarian
    0.13
    grass
    0.13
     PCA
    0.13
     toler
    0.13
    225
    0.13
    ToLeft
    0.13
    Act Density 0.065%

    No Known Activations