INDEX
    Explanations

    mentions of a specific entity or topic within a longer discussion context

    instances of the word "the"

    New Auto-Interp
    Negative Logits
    itiz
    -0.77
    ional
    -0.72
    irds
    -0.68
    abe
    -0.64
    Topics
    -0.62
    pointers
    -0.62
     owes
    -0.62
    #$
    -0.62
     malf
    -0.61
    uty
    -0.61
    POSITIVE LOGITS
     meantime
    1.66
     midst
    1.37
     aftermath
    1.26
     absence
    1.23
     context
    1.07
     simplest
    1.05
     guise
    1.04
     ensuing
    1.01
     wake
    1.00
     meanwhile
    0.98
    Act Density 0.167%

    No Known Activations