INDEX
    Explanations

    names ending in 'a' or 'i'

    New Auto-Interp
    Negative Logits
    6
    0.77
    :
    0.74
    9
    0.73
    5
    0.72
    4
    0.72
    .:
    0.71
     =
    0.68
    7
    0.68
    ):
    0.66
     :
    0.64
    POSITIVE LOGITS
    d
    1.00
    s
    0.86
    ל
    0.86
    k
    0.83
    kf
    0.80
    sama
    0.75
    ה
    0.73
    שׁ
    0.71
    c
    0.71
    f
    0.71
    Act Density 0.000%

    No Known Activations