INDEX
    Explanations

    the word "the" when followed by short phrases

    instances of the word "the."

    New Auto-Interp
    Negative Logits
     instead
    -0.76
    Layer
    -0.74
    zai
    -0.74
    .<
    -0.71
    worn
    -0.71
     whilst
    -0.71
     whereas
    -0.70
    .</
    -0.68
    rade
    -0.68
     because
    -0.68
    POSITIVE LOGITS
     aforementioned
    1.17
     latter
    1.11
     latest
    0.94
    ses
    0.89
     slightest
    0.88
     same
    0.88
     largest
    0.86
     greatest
    0.85
    oret
    0.85
     entire
    0.84
    Act Density 0.684%

    No Known Activations