INDEX
    Explanations

    repeated instances of the word "the."

    New Auto-Interp
    Negative Logits
    andan
    -0.14
    inski
    -0.14
    endum
    -0.14
     bl
    -0.14
     further
    -0.14
    rts
    -0.13
     earlier
    -0.13
    amoto
    -0.13
    (
    -0.13
    insk
    -0.13
    POSITIVE LOGITS
    interop
    0.19
    forces
    0.15
    LEX
    0.15
    lia
    0.14
    lah
    0.14
     пÑĥнкÑĤ
    0.14
     Quad
    0.14
    دار
    0.13
    ìłĢ
    0.13
    RAY
    0.13
    Act Density 0.049%

    No Known Activations