INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    >
    0.51
    of
    0.41
     одежды
    0.40
    in
    0.38
     of
    0.38
     weren
    0.36
    стые
    0.36
    viä
    0.36
     сеть
    0.35
    boli
    0.35
    POSITIVE LOGITS
     helplessly
    0.67
     attentively
    0.66
     Watching
    0.60
     watching
    0.59
    Watching
    0.57
     watch
    0.57
     beobachten
    0.56
     WATCH
    0.51
     наблю
    0.51
     atent
    0.51
    Act Density 0.049%

    No Known Activations