INDEX
    Explanations

    references to images or representations of individuals in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.07
    3:0.08
    4:0.34
    5:0.04
    6:0.06
    7:0.09
    8:0.05
    9:0.05
    10:0.06
    11:0.07
    Negative Logits
     Williamson
    -1.50
     believes
    -1.37
     understands
    -1.32
    quez
    -1.30
     recognized
    -1.29
     interpreted
    -1.28
     suppose
    -1.27
    opal
    -1.27
     emerged
    -1.27
    owder
    -1.26
    POSITIVE LOGITS
    selves
    1.49
    Tact
    1.47
     Goodbye
    1.46
     goodbye
    1.45
     Alone
    1.42
    xon
    1.41
    bye
    1.39
    xit
    1.39
    thood
    1.39
    anqu
    1.38
    Act Density 0.000%

    No Known Activations