INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    emia
    -0.08
    combat
    -0.08
     categorize
    -0.07
     testify
    -0.07
     find
    -0.07
     trochę
    -0.07
     букв
    -0.07
    ędzy
    -0.07
     doesn't
    -0.07
     slam
    -0.07
    POSITIVE LOGITS
     siquiera
    0.10
     embarking
    0.09
     preconce
    0.09
     blindly
    0.08
     başlam
    0.08
     lest
    0.08
     endgült
    0.08
     pursuing
    0.08
     outset
    0.08
     قبل
    0.08
    Act Density 0.088%

    No Known Activations