INDEX
    Explanations

    attends to tokens indicating determination from relevant tokens indicating derivation or cause

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.10
    2:0.07
    3:0.13
    4:0.44
    5:0.04
    6:0.06
    7:0.06
    Negative Logits
    ones
    -0.23
    _
    -0.23
    lite
    -0.21
    able
    -0.20
    k
    -0.20
    y
    -0.19
     kabul
    -0.19
    7
    -0.19
     ơi
    -0.18
    halb
    -0.18
    POSITIVE LOGITS
    oredCriteria
    0.60
    تقاوى
    0.54
    ()]);
    0.53
    '])->
    0.53
    ']);
    
    0.52
    }));
    
    0.50
     betweenstory
    0.50
    ")));
    
    0.50
    })));
    0.49
    '])){
    
    0.48
    Act Density 0.633%

    No Known Activations