INDEX
    Explanations

    articles following "the"

    New Auto-Interp
    Negative Logits
    mselves
    1.06
    oretically
    0.95
    ↵↵
    0.89
     כבר
    0.87
    jenigen
    0.86
     proverbial
    0.86
     spä
    0.86
    )[
    0.85
     Experten
    0.80
    0.78
    POSITIVE LOGITS
    ف
    1.20
    ك
    1.05
     and
    1.05
    И
    1.02
    ING
    0.95
    AR
    0.89
    во
    0.88
    по
    0.87
    0.84
     to
    0.83
    Act Density 0.000%

    No Known Activations