INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     номер
    0.61
     ذکر
    0.60
     napis
    0.59
     طرف
    0.57
     Mention
    0.56
     mentioning
    0.55
     yelling
    0.55
     прода
    0.54
     страш
    0.53
    0.53
    POSITIVE LOGITS
     understand
    1.01
     navigate
    0.99
     proactively
    0.94
     explore
    0.89
     rediscover
    0.89
     nurture
    0.88
     comprehend
    0.86
     overcome
    0.85
     discover
    0.85
     alleviate
    0.84
    Act Density 3.957%

    No Known Activations