INDEX
    Explanations

    Hebrew letters י and ו

    New Auto-Interp
    Negative Logits
    6
    0.82
    9
    0.79
    5
    0.76
    4
    0.75
    8
    0.72
    3
    0.68
    یم
    0.66
    7
    0.64
    =(
    0.64
    ります
    0.64
    POSITIVE LOGITS
    ي
    0.87
    z
    0.82
    <unused411>
    0.80
    <unused514>
    0.80
    ו
    0.79
    י
    0.79
    <unused626>
    0.79
    و
    0.78
    <unused365>
    0.78
    <unused455>
    0.78
    Act Density 0.184%

    No Known Activations