INDEX
    Explanations

    words and phrases indicating temporal references and historical context

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.02
    2:0.07
    3:0.04
    4:0.02
    5:0.06
    6:0.10
    7:0.13
    8:0.07
    9:0.05
    10:0.06
    11:0.31
    Negative Logits
    upper
    -1.34
     tomorrow
    -1.18
     Miracle
    -1.16
    ief
    -1.14
     Mermaid
    -1.13
    verse
    -1.13
     Doors
    -1.12
    exit
    -1.10
     doorstep
    -1.10
    lite
    -1.09
    POSITIVE LOGITS
     acknow
    1.36
     confir
    1.36
     reviewers
    1.20
     examples
    1.20
     deployments
    1.15
     disclaim
    1.14
    atell
    1.13
    merce
    1.12
    arlane
    1.10
    iths
    1.09
    Act Density 0.023%

    No Known Activations