INDEX
    Explanations

    phrases that indicate a contrasting or surprising element in a sentence

    repetitive phrases that contrast or introduce conditions

    New Auto-Interp
    Negative Logits
    ãĤ¨ãĥ«
    -0.78
    ãĥİ
    -0.78
    urated
    -0.77
    ãĤ¼ãĤ¦ãĤ¹
    -0.76
    esp
    -0.76
    tein
    -0.75
    ãĤ¡
    -0.73
    ufact
    -0.72
    ãĤ¦ãĤ¹
    -0.72
    ãĤ¿
    -0.70
    POSITIVE LOGITS
     somehow
    1.19
     despite
    0.99
     strangely
    0.96
     nonetheless
    0.95
     again
    0.94
     another
    0.89
     nevertheless
    0.87
     inexpl
    0.84
     mirac
    0.82
     somew
    0.82
    Act Density 0.039%

    No Known Activations