INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    י
    0.51
    ের
    0.50
    ה
    0.50
    )
    0.49
    ל
    0.49
     ancak
    0.48
    ی
    0.47
    :
    0.47
    ↵↵
    0.44
    ין
    0.43
    POSITIVE LOGITS
     (
    0.53
     of
    0.50
     at
    0.44
    k
    0.43
    raded
    0.42
    ty
    0.41
    nes
    0.40
    run
    0.40
    ministerium
    0.38
     this
    0.38
    Act Density 10.030%

    No Known Activations