INDEX
    Explanations

    the word "but," indicating a contrast or exception in the text

    New Auto-Interp
    Negative Logits
    ught
    -0.17
    ën
    -0.16
    iah
    -0.15
    ullan
    -0.14
    ption
    -0.14
    ziel
    -0.14
    urs
    -0.13
       
    -0.13
    اذا
    -0.13
    him
    -0.13
    POSITIVE LOGITS
     wait
    0.23
    tery
    0.20
    cher
    0.20
     Wait
    0.19
    tach
    0.18
    wait
    0.17
     WAIT
    0.16
    tern
    0.16
    Wait
    0.16
     why
    0.16
    Act Density 0.085%

    No Known Activations