INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ₁+
    0.47
    hangi
    0.46
     Yatha
    0.46
    ותו
    0.46
    ccgi
    0.45
    טו
    0.45
    0.44
    лог
    0.44
    💪
    0.44
    ."'
    0.44
    POSITIVE LOGITS
     =
    0.53
    0.50
     is
    0.49
     are
    0.46
     new
    0.46
     strict
    0.45
    ——
    0.45
    0.45
    0.44
     है
    0.43
    Act Density 0.051%

    No Known Activations