INDEX
    Explanations

    conjunctions and phrases indicating continuity or addition

    New Auto-Interp
    Negative Logits
    atte
    -0.17
    aises
    -0.16
     Beled
    -0.15
    gart
    -0.15
    ãĥ¼ãĥł
    -0.15
    gne
    -0.14
    inded
    -0.14
    erate
    -0.14
    oplevel
    -0.14
    yn
    -0.14
    POSITIVE LOGITS
    vi
    0.17
     viol
    0.16
    bane
    0.16
    untu
    0.15
    ills
    0.15
    -selector
    0.15
    ibri
    0.15
    onto
    0.14
    asta
    0.14
    609
    0.14
    Act Density 0.188%

    No Known Activations