INDEX
    Explanations

    mentions of specific locations within sentences

    the phrase "the" and its various usages across sentences

    New Auto-Interp
    Negative Logits
    itiz
    -0.73
    ional
    -0.72
    #$
    -0.65
    irds
    -0.64
    abel
    -0.61
    edly
    -0.61
    yet
    -0.61
     owes
    -0.61
    abe
    -0.60
    uty
    -0.58
    POSITIVE LOGITS
     meantime
    1.69
     midst
    1.34
     absence
    1.26
     aftermath
    1.23
     case
    1.05
     ensuing
    1.04
     wake
    1.01
     simplest
    0.96
     meanwhile
    0.96
     context
    0.96
    Act Density 0.137%

    No Known Activations