INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    fact
    -0.08
     hinge
    -0.08
    ummings
    -0.08
    mid
    -0.07
     factoring
    -0.07
     examining
    -0.07
    _IND
    -0.07
     vent
    -0.07
     confirm
    -0.07
    middleware
    -0.07
    POSITIVE LOGITS
     nelle
    0.09
    овыми
    0.08
     Bes
    0.08
     ihren
    0.08
     XK
    0.07
     denen
    0.07
    .patient
    0.07
     Guerre
    0.07
     João
    0.07
    кот
    0.07
    Act Density 0.007%

    No Known Activations