INDEX
    Explanations

    phrases indicating a contrast or contradiction to some stated beliefs or expectations

    phrases that challenge popular beliefs or narratives

    New Auto-Interp
    Negative Logits
    eport
    -0.83
    ahead
    -0.81
    estones
    -0.80
    iless
    -0.74
    enary
    -0.74
    erning
    -0.71
    gins
    -0.71
    hene
    -0.71
    pan
    -0.71
    between
    -0.70
    POSITIVE LOGITS
     expectations
    1.08
     belief
    0.98
     expectation
    0.94
     stereotype
    0.92
     stereotypes
    0.88
     popular
    0.87
     suggestion
    0.84
     intuition
    0.82
     appearances
    0.82
     assertions
    0.81
    Act Density 0.059%

    No Known Activations