INDEX
    Explanations

    author references in academic writing

    references to academic papers, specifically those denoted by "et al."

    New Auto-Interp
    Negative Logits
    FUL
    -0.81
    OUNT
    -0.79
    canon
    -0.76
    ardless
    -0.69
    esses
    -0.68
    velength
    -0.65
    ppo
    -0.64
    finger
    -0.63
    @#&
    -0.63
    IFT
    -0.62
    POSITIVE LOGITS
     seq
    1.19
    rics
    0.92
     al
    0.87
    ween
    0.85
    hetically
    0.81
    ree
    0.79
    iated
    0.79
    iation
    0.78
    ching
    0.77
    sis
    0.77
    Act Density 0.015%

    No Known Activations