INDEX
    Explanations

    attends to the tokens indicating a significant outcome or concept from corresponding tokens representing a contrasting or opposing idea

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.09
    2:0.05
    3:0.03
    4:0.19
    5:0.47
    6:0.05
    7:0.05
    Negative Logits
     kaarangay
    -0.44
     ostavi
    -0.40
    Personendaten
    -0.39
     Roskov
    -0.38
    adpleegd
    -0.38
    Tikang
    -0.37
     telefónica
    -0.36
     détru
    -0.35
     betweenstory
    -0.35
     createSlice
    -0.35
    POSITIVE LOGITS
    opus
    0.24
     Wikimédia
    0.23
     Premios
    0.22
    [
    0.21
    portál
    0.21
    neme
    0.21
    []
    0.20
    heur
    0.20
    aggio
    0.20
    mée
    0.20
    Act Density 0.331%

    No Known Activations