INDEX
    Explanations

    claims and statements regarding actions or behaviors

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.17
    3:0.26
    4:0.08
    5:0.04
    6:0.05
    7:0.05
    8:0.06
    9:0.06
    10:0.10
    11:0.04
    Negative Logits
     Coordinator
    -1.71
     Commodore
    -1.49
     Chr
    -1.44
     Garry
    -1.43
    ainment
    -1.41
     Corporate
    -1.39
     POLITICO
    -1.38
     Restoration
    -1.38
    GY
    -1.37
     Roh
    -1.37
    POSITIVE LOGITS
    reply
    1.75
    itutes
    1.56
    trigger
    1.54
     prostitutes
    1.53
    votes
    1.51
     violates
    1.50
    ocaust
    1.46
    1.44
    wrong
    1.42
     "...
    1.42
    Act Density 0.021%

    No Known Activations