INDEX
    Explanations

    contrasting relationships or ideas

    the word "yet," indicating a contrast or contradiction in various contexts

    New Auto-Interp
    Negative Logits
    ãĤ¿
    -0.79
    esp
    -0.77
    ãĥİ
    -0.77
    ufact
    -0.76
    ãĤ¼ãĤ¦ãĤ¹
    -0.74
    ãĤ¦ãĤ¹
    -0.73
    ãĤ¨ãĥ«
    -0.73
    ãĤ¡
    -0.73
    / 
    -0.72
    tein
    -0.72
    POSITIVE LOGITS
     somehow
    1.17
     despite
    1.07
     again
    0.96
     strangely
    0.93
     another
    0.89
     nonetheless
    0.87
     whenever
    0.82
     somew
    0.81
     inexpl
    0.79
     nevertheless
    0.78
    Act Density 0.046%

    No Known Activations