INDEX
    Explanations

    references or citations within a text

    references to citations or sources used in arguments

    New Auto-Interp
    Negative Logits
    ensibly
    -0.77
    cos
    -0.75
    Sphere
    -0.71
    byss
    -0.69
    mire
    -0.69
    ateurs
    -0.68
    icion
    -0.67
    mid
    -0.66
    encia
    -0.66
    quer
    -0.65
    POSITIVE LOGITS
     similarities
    0.95
     preced
    0.95
     accomplishments
    0.93
     debunked
    0.90
     example
    0.87
     similarity
    0.85
     precedent
    0.85
     examples
    0.84
     shortcomings
    0.83
     inaccur
    0.83
    Act Density 0.324%

    No Known Activations