INDEX
    Explanations

    prefixes used in written text, such as "The" before specific nouns

    occurrences of the word "The"

    New Auto-Interp
    Negative Logits
     reserve
    -0.70
     wound
    -0.68
     rank
    -0.68
     luck
    -0.66
     arch
    -0.64
     drop
    -0.63
     assigned
    -0.63
     care
    -0.63
     favor
    -0.63
     equivalent
    -0.62
    POSITIVE LOGITS
    The
    2.30
    There
    1.72
    ccording
    1.72
    THE
    1.64
    This
    1.64
    When
    1.61
    Both
    1.58
    It
    1.58
    While
    1.56
    Our
    1.56
    Act Density 0.231%

    No Known Activations