INDEX
    Explanations

    references to citations needed within a text

    instances of citations or references to sources

    New Auto-Interp
    Negative Logits
    inav
    -0.86
    pora
    -0.82
    milo
    -0.77
    ratulations
    -0.71
    roups
    -0.64
    quer
    -0.63
    ynes
    -0.62
    gradation
    -0.61
    wear
    -0.61
    oshenko
    -0.60
    POSITIVE LOGITS
    =]
    1.04
     omitted
    1.03
    ]"
    0.86
     redacted
    0.83
     footnote
    0.78
    ])
    0.76
    ]),
    0.76
    ],[
    0.76
    ?]
    0.75
    ]
    0.74
    Act Density 0.038%

    No Known Activations