INDEX
    Explanations

    phrases related to authority figures making public statements

    references to political positions or statements made publicly

    New Auto-Interp
    Negative Logits
     unpop
    -0.70
    asionally
    -0.57
     predec
    -0.56
    choes
    -0.54
    eteenth
    -0.53
    ommod
    -0.53
     Peb
    -0.52
     heterogeneity
    -0.50
     longstanding
    -0.49
    zens
    -0.49
    POSITIVE LOGITS
    ,,,,
    0.94
     to
    0.87
    """
    0.74
     unto
    0.74
     towards
    0.71
    [/
    0.69
     [/
    0.69
     thats
    0.69
    !!!!
    0.67
    ""
    0.67
    Act Density 0.780%

    No Known Activations