INDEX
    Explanations

    pronouns and following context

    New Auto-Interp
    Negative Logits
    ו
    0.84
    ?
    0.73
    on
    0.71
    و
    0.65
     be
    0.62
    P
    0.60
    0.58
    ity
    0.55
    kra
    0.55
    light
    0.54
    POSITIVE LOGITS
    0.65
     augmenté
    0.55
    ين
    0.53
     הראש
    0.51
    에게
    0.50
     successivo
    0.50
     QSOs
    0.49
    0
    0.49
    0.49
     in
    0.48
    Act Density 2.635%

    No Known Activations