INDEX
    Explanations

    names of authors and references in academic citations

    New Auto-Interp
    Negative Logits
    linky
    -0.18
    ahren
    -0.17
    isy
    -0.15
    ustin
    -0.15
    996
    -0.14
    ruary
    -0.14
    idia
    -0.13
    nr
    -0.13
    mic
    -0.13
    iore
    -0.13
    POSITIVE LOGITS
     Lam
    0.17
    hti
    0.16
     quadr
    0.16
    iveau
    0.15
     lam
    0.15
     quad
    0.15
    erken
    0.15
    Ā
    0.14
    ãĥªãĤ«
    0.14
    igli
    0.14
    Act Density 0.010%

    No Known Activations