INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     we
    -1.00
     you
    -0.63
     нам
    -0.55
     genoemd
    -0.54
     étions
    -0.53
     our
    -0.53
    我們
    -0.51
    我们
    -0.50
     нами
    -0.50
     EXISTS
    -0.50
    POSITIVE LOGITS
    '
    0.60
    0.60
    athers
    0.59
    ires
    0.58
     perceive
    0.57
    tubers
    0.57
    knecht
    0.55
     nahilalakip
    0.54
    lloworld
    0.54
    inage
    0.53
    Act Density 0.011%

    No Known Activations