INDEX
    Explanations

    attends to various mentions of the token "not" that appear in combination with other tokens later in the sequence

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.11
    2:0.10
    3:0.04
    4:0.27
    5:0.28
    6:0.04
    7:0.04
    Negative Logits
    king
    -0.42
    LookAnd
    -0.39
     Holman
    -0.34
     Corea
    -0.33
     Lipa
    -0.33
    KING
    -0.32
    REQ
    -0.31
     GILBERT
    -0.31
    ẩn
    -0.31
    Clic
    -0.30
    POSITIVE LOGITS
    <![
    0.41
    parsedMessage
    0.40
    mbggenerated
    0.39
    ACTERS
    0.38
     ComVisible
    0.38
    xffffffff
    0.37
    eniably
    0.37
    enschappelijke
    0.37
    databind
    0.36
    extAlignment
    0.36
    Act Density 0.062%

    No Known Activations