INDEX
    Explanations

    contrasting statements, particularly focusing on disproving or negating an initial idea

    contrasting statements that introduce clarification or correction

    New Auto-Interp
    Negative Logits
    asks
    -0.71
    illary
    -0.66
    meta
    -0.64
    ILLE
    -0.63
    nat
    -0.62
    ct
    -0.62
    holder
    -0.61
    enter
    -0.60
    orter
    -0.60
    ory
    -0.59
    POSITIVE LOGITS
     rather
    1.38
     nevertheless
    1.12
     merely
    1.10
    rather
    1.09
     nonetheless
    1.02
     suffice
    0.98
     Rather
    0.96
     instead
    0.91
     simply
    0.90
     luckily
    0.84
    Act Density 0.110%

    No Known Activations