INDEX
    Explanations

    references to research papers

    references to research papers and academic publications

    New Auto-Interp
    Negative Logits
    alez
    -0.88
    akening
    -0.70
     Lowell
    -0.64
    endor
    -0.64
    eties
    -0.60
    xon
    -0.59
     Brittany
    -0.58
     Harmony
    -0.58
    ichita
    -0.57
    Sax
    -0.57
    POSITIVE LOGITS
    Paper
    1.21
    clip
    1.02
     paper
    0.97
    flies
    0.91
     papers
    0.89
     Paper
    0.89
     towels
    0.86
    papers
    0.84
    paper
    0.81
     towel
    0.79
    Act Density 0.014%

    No Known Activations