INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    h
    1.61
    är
    1.17
    ography
    1.13
    al
    1.10
    y
    1.09
    -
    1.06
    ing
    1.02
    ,
    0.98
    ij
    0.97
    ili
    0.97
    POSITIVE LOGITS
    ان
    1.57
    ي
    1.39
    ка
    1.38
    ות
    1.35
    ك
    1.34
    м
    1.30
    ма
    1.25
    ла
    1.23
    an
    1.16
    DEBUG
    1.16
    Act Density 0.010%

    No Known Activations