INDEX
    Explanations

    introduces explanation or contrast

    New Auto-Interp
    Negative Logits
     thereby
    1.95
    which
    1.87
     which
    1.87
     ensuring
    1.62
     keeping
    1.61
    从而
    1.59
     allowing
    1.55
     hoping
    1.52
    Which
    1.47
     aiming
    1.44
    POSITIVE LOGITS
     exista
    0.85
     Хотя
    0.81
     quoique
    0.81
     existir
    0.80
     существует
    0.80
    Exists
    0.80
     istnieje
    0.80
     хотя
    0.80
     esiste
    0.78
     conviene
    0.75
    Act Density 0.068%

    No Known Activations