INDEX
    Explanations

    mentions of U.S. states and their relationship to specific policies or events

    New Auto-Interp
    Head Attr Weights
    0:0.18
    1:0.01
    2:0.11
    3:0.03
    4:0.05
    5:0.03
    6:0.06
    7:0.02
    8:0.07
    9:0.07
    10:0.14
    11:0.17
    Negative Logits
    -1.19
    20439
    -1.17
     subtract
    -1.17
    Reviewer
    -1.16
     separ
    -1.14
     conven
    -1.10
    icides
    -1.08
     leve
    -1.06
    Redditor
    -1.06
     derogatory
    -1.02
    POSITIVE LOGITS
    ember
    1.34
    Mars
    1.32
    phia
    1.29
    chell
    1.25
     respectively
    1.22
    Web
    1.21
     etc
    1.17
    runners
    1.16
    Share
    1.16
    Amazon
    1.16
    Act Density 0.020%

    No Known Activations