INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.75
    ש
    1.74
    gi
    1.66
    1.66
    на
    1.61
    ra
    1.58
    ri
    1.57
    in
    1.52
    ter
    1.52
    1.52
    POSITIVE LOGITS
    ه
    1.95
    1.70
    ין
    1.52
     modernization
    1.45
     alamu
    1.44
    ి
    1.43
    1.38
    ের
    1.36
    ズム
    1.36
     причем
    1.36
    Act Density 0.082%

    No Known Activations