INDEX
    Explanations

    instances where a situation is being described as different from what is expected

    the word "Instead" and its various forms as a pivot in discussions or arguments

    New Auto-Interp
    Negative Logits
    AZ
    -0.67
    Condition
    -0.61
    rament
    -0.59
     neighbourhood
    -0.59
    ENTS
    -0.58
    ental
    -0.57
    SF
    -0.57
    ties
    -0.56
    gin
    -0.55
    emate
    -0.55
    POSITIVE LOGITS
     opting
    0.78
    ples
    0.75
    terness
    0.75
    ertodd
    0.70
    ortun
    0.69
    chart
    0.66
    ilon
    0.65
    zbek
    0.65
    ¬¼
    0.65
     preferring
    0.62
    Act Density 0.024%

    No Known Activations