INDEX
    Explanations

    phrases related to publicity and attention

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.08
    3:0.06
    4:0.10
    5:0.03
    6:0.03
    7:0.35
    8:0.03
    9:0.05
    10:0.08
    11:0.08
    Negative Logits
    otomy
    -1.62
    inement
    -1.50
    ometry
    -1.47
    omers
    -1.42
    guided
    -1.38
     Harmony
    -1.36
    ternal
    -1.35
    esthetic
    -1.35
    uitive
    -1.34
    itored
    -1.32
    POSITIVE LOGITS
     Crusade
    1.62
     libel
    1.58
    bucks
    1.45
     oneself
    1.44
     accuser
    1.42
     domestically
    1.38
    Reporting
    1.35
     fraudulent
    1.35
     sensational
    1.34
     tremend
    1.34
    Act Density 0.002%

    No Known Activations