INDEX
    Explanations

    phrases related to comparison and contrast

    instances of punctuation, specifically commas, often indicating lists or separation in thoughts

    New Auto-Interp
    Negative Logits
    zos
    -0.82
    sers
    -0.65
    tv
    -0.62
    oir
    -0.62
    hn
    -0.61
    zers
    -0.61
    lvl
    -0.61
    aron
    -0.61
    lees
    -0.60
    aris
    -0.60
    POSITIVE LOGITS
     respectively
    0.89
     curfew
    0.70
     depending
    0.69
    depending
    0.65
    iffe
    0.57
     disclaim
    0.56
     dispos
    0.55
     whichever
    0.55
     premature
    0.55
    ilation
    0.54
    Act Density 0.237%

    No Known Activations