INDEX
    Explanations

    short phrases introducing a topic or statement

    instances of the word "This" and other introductory or demonstrative words

    New Auto-Interp
    Negative Logits
    gomery
    -0.62
    itud
    -0.60
    aign
    -0.60
    ificial
    -0.60
     Cummings
    -0.59
     rall
    -0.58
    lie
    -0.57
     Lyon
    -0.57
    IDA
    -0.57
    INO
    -0.57
    POSITIVE LOGITS
    itialized
    0.83
    cano
    0.72
    ĪĴ
    0.62
    cknowled
    0.61
     Started
    0.61
    oran
    0.61
    ymes
    0.60
    ths
    0.60
    geist
    0.59
     Own
    0.59
    Act Density 0.326%

    No Known Activations