INDEX
    Explanations

    phrases or sentences that introduce a contrast or counterpoint

    the word "that" used in various contexts

    New Auto-Interp
    Negative Logits
    å§«
    -0.75
    sbm
    -0.74
    byss
    -0.72
    pots
    -0.72
    pill
    -0.72
    pecially
    -0.72
    hops
    -0.71
     dstg
    -0.71
    olics
    -0.71
    adelphia
    -0.71
    POSITIVE LOGITS
     doesn
    1.00
     ignores
    0.93
     hasn
    0.92
     doesnt
    0.90
     nonetheless
    0.89
     depends
    0.89
     differs
    0.86
     isn
    0.85
     didn
    0.84
     wasn
    0.84
    Act Density 0.106%

    No Known Activations