INDEX
    Explanations

    attends to the sources of information from the tokens that follow them, suggesting a connection between insights or opinions and what is said afterward

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.07
    2:0.06
    3:0.09
    4:0.09
    5:0.04
    6:0.44
    7:0.10
    Negative Logits
    timus
    -0.36
    adina
    -0.32
    还好
    -0.31
    ADORA
    -0.31
    — 
    -0.29
    omány
    -0.29
    venty
    -0.28
    -0.28
    ktive
    -0.28
    myname
    -0.28
    POSITIVE LOGITS
     defaultstate
    0.44
    __':
    
    0.43
    RTSC
    0.41
    FunctionFlags
    0.40
    ArgumentParser
    0.39
    IsMutable
    0.39
     ['./
    0.39
    complexContent
    0.38
    WriteBarrier
    0.38
     oprot
    0.37
    Act Density 0.076%

    No Known Activations