INDEX
    Explanations

    repetitive usage of the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    ,
    -0.47
    I
    -0.44
     and
    -0.39
    .
    -0.39
     in
    -0.38
     I
    -0.37
    -0.37
     for
    -0.37
    2
    -0.36
     failed
    -0.36
    POSITIVE LOGITS
     ویکی‌پدیا
    0.75
     increí
    0.72
    ſſung
    0.72
    ſelves
    0.69
     ſei
    0.69
    ſelf
    0.66
    GIVEREF
    0.66
    <unused42>
    0.65
    <unused23>
    0.64
    <unused43>
    0.64
    Act Density 0.498%

    No Known Activations