INDEX
    Explanations

    Hebrew and Arabic letters

    New Auto-Interp
    Negative Logits
    ある
    1.91
    م
    1.88
    1.84
    什么
    1.80
    이었다
    1.80
    이었
    1.72
    节省
    1.70
    ת
    1.70
     roadblocks
    1.63
    1.63
    POSITIVE LOGITS
    n
    2.27
    am
    1.98
    ੍ਹ
    1.68
    l
    1.65
    stove
    1.63
    jacke
    1.55
    nivel
    1.48
    enquête
    1.48
    straction
    1.45
    ufl
    1.45
    Act Density 0.020%

    No Known Activations