INDEX
    Explanations

    words related to historical events, political discussions, and policy activities, with a focus on specific details and narratives

    phrases indicating contrast or exceptions in contexts

    New Auto-Interp
    Head Attr Weights
    0:0.23
    1:0.03
    2:0.06
    3:0.13
    4:0.03
    5:0.10
    6:0.07
    7:0.02
    8:0.06
    9:0.11
    10:0.08
    11:0.02
    Negative Logits
     Blend
    -1.14
     Yep
    -1.12
     dear
    -1.08
     Vampire
    -1.08
     Pick
    -1.04
     Wiz
    -1.03
    Yep
    -1.02
     Mats
    -1.02
    guard
    -1.00
     Picks
    -1.00
    POSITIVE LOGITS
    xual
    1.38
    agara
    1.34
    olson
    1.33
     glim
    1.28
    ihara
    1.25
     acknowled
    1.24
     anecd
    1.24
    glers
    1.17
    lihood
    1.12
    ulton
    1.12
    Act Density 0.117%

    No Known Activations