INDEX
    Explanations

    phrases that indicate substitution or replacement

    New Auto-Interp
    Negative Logits
    azon
    -0.16
    agrid
    -0.15
    ENCHMARK
    -0.15
    rna
    -0.15
    udad
    -0.14
    vla
    -0.14
    uren
    -0.14
    .SYSTEM
    -0.14
    TestCategory
    -0.14
    quam
    -0.14
    POSITIVE LOGITS
     substitute
    0.26
    replace
    0.24
     replace
    0.24
     substitution
    0.23
     replacing
    0.23
     replaces
    0.22
     substit
    0.22
     substitutes
    0.20
     Substitute
    0.20
     replacement
    0.20
    Act Density 0.065%

    No Known Activations