INDEX
    Explanations

    attends to the first punctuation from later tokens that are part of complex clauses

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.09
    2:0.08
    3:0.15
    4:0.14
    5:0.05
    6:0.28
    7:0.10
    Negative Logits
    تقاوى
    -0.51
    DeleteBehavior
    -0.47
     bezeichneter
    -0.46
    TagHelper
    -0.46
    
    -0.45
     "..\..\..\
    -0.45
     ivelany
    -0.45
    abestanden
    -0.45
    WithIOException
    -0.44
    Aiheesta
    -0.43
    POSITIVE LOGITS
    ↵↵
    0.23
    于是
    0.23
     innocently
    0.23
     então
    0.22
    0.21
    olge
    0.21
    1
    0.20
    ,
    0.20
    5
    0.20
     便
    0.20
    Act Density 0.111%

    No Known Activations