INDEX
    Explanations

    occurrences of the word "and" in the text

    New Auto-Interp
    Negative Logits
    RenderAtEndOf
    -0.84
    Rujuakan
    -0.71
     zwiſchen
    -0.71
     pinulongan
    -0.70
     deſſen
    -0.68
    <unused41>
    -0.68
    <unused68>
    -0.68
    <unused23>
    -0.68
    <unused74>
    -0.68
    <pad>
    -0.68
    POSITIVE LOGITS
     I
    0.48
     hence
    0.48
     it
    0.47
     we
    0.47
     but
    0.46
    I
    0.46
    but
    0.44
     which
    0.44
     is
    0.43
    0.42
    Act Density 0.277%

    No Known Activations