INDEX
    Explanations

    references to specific individuals or entities and their associated attributes or actions

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.05
    3:0.07
    4:0.24
    5:0.02
    6:0.05
    7:0.33
    8:0.03
    9:0.02
    10:0.05
    11:0.06
    Negative Logits
    lance
    -1.45
    quartered
    -1.37
    wark
    -1.36
    olk
    -1.30
    gered
    -1.27
    rouse
    -1.25
    860
    -1.21
     YORK
    -1.19
    eln
    -1.19
    usk
    -1.17
    POSITIVE LOGITS
     names
    1.87
     positives
    1.85
     prominently
    1.75
     redacted
    1.68
     boxes
    1.62
     markers
    1.61
     lists
    1.59
     items
    1.58
     similarities
    1.54
     negatives
    1.51
    Act Density 0.007%

    No Known Activations