INDEX
    Explanations

    the preceding specific entities

    New Auto-Interp
    Negative Logits
    0.59
    ir
    0.48
    ות
    0.48
    te
    0.47
    en
    0.47
     For
    0.46
    il
    0.45
    ى
    0.43
    αν
    0.42
     A
    0.42
    POSITIVE LOGITS
     to
    0.61
    ेक्स
    0.50
     that
    0.49
    0.47
     که
    0.42
    ă
    0.42
     la
    0.41
    د
    0.41
     nació
    0.41
    0.40
    Act Density 2.018%

    No Known Activations