INDEX
    Explanations

    sources or attributions in a document

    references to various sources or citations in a text

    New Auto-Interp
    Negative Logits
    estern
    -0.83
    oÄŁ
    -0.76
    okers
    -0.71
    oso
    -0.69
     destro
    -0.68
    apo
    -0.67
    hma
    -0.67
    psey
    -0.67
    eg
    -0.67
     cumbers
    -0.65
    POSITIVE LOGITS
     Sources
    1.00
    Fed
    0.94
     Source
    0.92
    source
    0.83
    Forge
    0.81
    ource
    0.81
    Republic
    0.77
    books
    0.76
    Cub
    0.74
     Republic
    0.74
    Act Density 0.016%

    No Known Activations