INDEX
    Explanations

    attends to numerical values from unrelated tokens

    New Auto-Interp
    Head Attr Weights
    0:0.11
    1:0.13
    2:0.10
    3:0.11
    4:0.10
    5:0.11
    6:0.13
    7:0.17
    Negative Logits
    LookAnd
    -0.42
    -0.41
     AssemblyCulture
    -0.37
     onOptions
    -0.32
     للاسماء
    -0.31
     <<<<<<<<<<<<<<
    -0.31
     XCTest
    -0.31
    igshid
    -0.31
    LEncoder
    -0.31
     Мексичка
    -0.31
    POSITIVE LOGITS
     Portail
    0.27
    äumt
    0.24
    0.23
     typique
    0.23
    lcccc
    0.23
    OGND
    0.22
     CFC
    0.22
     novo
    0.21
    uolo
    0.21
    Instead
    0.21
    Act Density 0.743%

    No Known Activations