INDEX
    Explanations

    conjunctions followed by content that introduces a contrasting or opposite idea

    instances of the word "But" indicating contrast or exception

    New Auto-Interp
    Negative Logits
    heads
    -0.69
     segment
    -0.68
    .","
    -0.61
    fell
    -0.60
     sym
    -0.59
    ¯¯¯¯
    -0.58
     ceremony
    -0.57
     award
    -0.56
    ization
    -0.56
     paths
    -0.55
    POSITIVE LOGITS
    tons
    1.33
    romeda
    0.93
     alas
    0.90
    theless
    0.88
    withstanding
    0.85
    thodox
    0.83
    chers
    0.83
    ts
    0.82
    tif
    0.79
    anamo
    0.78
    Act Density 0.067%

    No Known Activations