INDEX
    Explanations

    words related to emphasis or clarification of a statement, often indicating a contrast between appearances and reality

    words that indicate clarification or contradiction of a statement

    New Auto-Interp
    Negative Logits
    regate
    -0.83
    inav
    -0.72
    enaries
    -0.71
    idences
    -0.69
    might
    -0.69
    ortmund
    -0.68
     Awakens
    -0.68
    rug
    -0.66
    doms
    -0.65
    would
    -0.65
    POSITIVE LOGITS
     synonymous
    0.95
     supposed
    0.93
     irrelevant
    0.93
     considered
    0.93
     indicative
    0.93
     incompatible
    0.90
     regarded
    0.89
     going
    0.88
     problematic
    0.87
     worth
    0.87
    Act Density 0.281%

    No Known Activations