INDEX
    Explanations

    phrases indicating comparison or contrast

    New Auto-Interp
    Negative Logits
    .this
    -0.15
    तम
    -0.15
    riter
    -0.14
    ãģªãĤī
    -0.14
     hence
    -0.14
    @update
    -0.14
     therefore
    -0.14
    trl
    -0.13
    uien
    -0.13
    plx
    -0.13
    POSITIVE LOGITS
     although
    0.85
    although
    0.69
     Although
    0.66
    èϽçĦ¶
    0.63
     while
    0.63
    Although
    0.62
     though
    0.58
     aunque
    0.56
    èϽ
    0.52
     While
    0.52
    Act Density 0.440%

    No Known Activations