INDEX
    Explanations

    words associated with gradients and measurements of performance or risk

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.14
    2:0.04
    3:0.05
    4:0.04
    5:0.27
    6:0.05
    7:0.03
    8:0.05
    9:0.10
    10:0.07
    11:0.04
    Negative Logits
     nomination
    -1.44
     Born
    -1.36
     allegiance
    -1.34
     affiliation
    -1.33
     UD
    -1.32
     appearance
    -1.31
     Preferred
    -1.31
     endors
    -1.30
    reve
    -1.29
    ndra
    -1.29
    POSITIVE LOGITS
    WARE
    1.64
    ipel
    1.60
    1.59
    Balt
    1.56
    PLIED
    1.49
     istg
    1.44
     Grimoire
    1.40
     pandemonium
    1.37
    ograp
    1.37
     sqor
    1.36
    Act Density 0.014%

    No Known Activations