INDEX
    Explanations

    instances of the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    both
    -0.56
    /
    -0.53
     and
    -0.53
    ,
    -0.52
    -
    -0.50
    whether
    -0.47
    another
    -0.44
     both
    -0.43
    using
    -0.43
    being
    -0.43
    POSITIVE LOGITS
     same
    1.32
     entire
    1.24
     majority
    1.18
     entirety
    1.14
     aforementioned
    1.13
     latter
    1.10
     slightest
    1.08
     meisten
    1.07
     following
    1.05
     whole
    1.05
    Act Density 3.042%

    No Known Activations