INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    k
    0.96
    ח
    0.96
    (
    0.89
    :
    0.87
    ک
    0.86
    0.86
    حی
    0.82
    '
    0.79
    س
    0.79
    "
    0.78
    POSITIVE LOGITS
    ým
    0.77
    0.68
     Electrons
    0.68
     electrons
    0.65
     have
    0.62
    electrons
    0.62
    يو
    0.61
    ého
    0.59
    ů
    0.59
    ów
    0.58
    Act Density 0.006%

    No Known Activations