INDEX
    Explanations

    at the start of clauses

    New Auto-Interp
    Negative Logits
     socalled
    0.71
    ↵↵↵
    0.58
     wellknown
    0.57
    ↵↵↵↵↵↵↵↵↵
    0.55
    .""
    0.51
    .}\
    0.50
    .}
    0.49
     thus
    0.49
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.48
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.47
    POSITIVE LOGITS
     While
    1.51
    1.51
     Although
    1.44
     Despite
    1.39
     As
    1.35
     Depending
    1.34
     Though
    1.33
     For
    1.31
     This
    1.29
     There
    1.28
    Act Density 5.117%

    No Known Activations