INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    h
    1.14
    س
    1.14
    1.12
    al
    1.11
    ant
    1.05
    f
    1.00
    ના
    0.99
    ח
    0.98
    د
    0.97
    יי
    0.96
    POSITIVE LOGITS
    н
    1.20
    1.18
    n
    0.88
    0.87
     a
    0.82
     STATE
    0.80
    0.80
    0.77
    іль
    0.76
    м
    0.76
    Act Density 0.004%

    No Known Activations