INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    כ
    -0.08
     دوب
    -0.07
     thrills
    -0.07
    hann
    -0.07
     shampo
    -0.07
     metrics
    -0.07
    _VAR
    -0.07
     هاتف
    -0.07
     превыш
    -0.06
     mesurer
    -0.06
    POSITIVE LOGITS
     gedeelt
    0.11
     partial
    0.10
    Partial
    0.10
    partial
    0.10
     parcialmente
    0.09
     ज्ञ
    0.09
     জানা
    0.09
     తెలిస
    0.09
     parcial
    0.09
     partially
    0.09
    Act Density 0.013%

    No Known Activations