INDEX
    Explanations

    phrases discussing political questions and controversies

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.02
    2:0.06
    3:0.06
    4:0.03
    5:0.20
    6:0.09
    7:0.07
    8:0.05
    9:0.06
    10:0.16
    11:0.07
    Negative Logits
     halftime
    -1.13
    ensions
    -1.03
    umn
    -0.96
    venants
    -0.94
    ukong
    -0.91
     Railroad
    -0.88
     Quan
    -0.88
    Sep
    -0.88
     Celest
    -0.86
     Verse
    -0.85
    POSITIVE LOGITS
    !).
    1.57
    )."
    1.57
    ?).
    1.56
    )</
    1.54
    ).[
    1.42
    ).
    1.37
    )}
    1.36
    ?)
    1.36
    !)
    1.35
    \)
    1.33
    Act Density 0.359%

    No Known Activations