INDEX
    Explanations

    attends to tokens marked with specific numerical patterns or symbols from tokens in parentheses

    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.10
    2:0.42
    3:0.06
    4:0.08
    5:0.04
    6:0.06
    7:0.12
    Negative Logits
     }}"></
    -0.47
    }}$\\
    -0.44
     }}}{
    -0.39
    "]))
    -0.38
    "],
    
    -0.38
    "]),
    -0.38
    vernight
    -0.37
    lgari
    -0.36
    "])
    
    -0.36
    Personensuche
    -0.36
    POSITIVE LOGITS
     sacco
    0.30
     Byers
    0.28
     altı
    0.28
     urbanas
    0.26
    -${
    0.26
    ys
    0.26
     invokingState
    0.25
    iner
    0.24
     közül
    0.24
    ima
    0.24
    Act Density 0.005%

    No Known Activations