INDEX
    Explanations

    mentions of specific locations or settings

    occurrences of the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    itiz
    -0.70
    emate
    -0.66
    ulence
    -0.66
    pointers
    -0.64
    witch
    -0.63
    anism
    -0.63
    antes
    -0.62
    pers
    -0.60
    ional
    -0.60
    irds
    -0.60
    POSITIVE LOGITS
     meantime
    1.45
     midst
    1.24
     aftermath
    1.16
     absence
    1.06
     guise
    1.04
     simplest
    1.04
     context
    1.02
     same
    0.97
     ensuing
    0.95
     case
    0.95
    Act Density 0.148%

    No Known Activations