INDEX
    Explanations

    Discourse Processing, Analysis, Representation, marker

    New Auto-Interp
    Negative Logits
    س
    0.88
    де
    0.87
    0.85
    0.75
    0.72
    0.72
    و
    0.70
    ح
    0.70
    are
    0.68
    0.68
    POSITIVE LOGITS
    \
    0.73
     enraged
    0.71
     relacion
    0.66
     entidades
    0.66
     gobern
    0.66
     Partido
    0.66
     Hochzeit
    0.64
    ڨ
    0.64
    0.64
    0.63
    Act Density 0.001%

    No Known Activations