INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ↵↵↵↵↵↵↵↵↵
    0.43
     был
    0.42
     assayed
    0.42
    ↵↵↵↵↵↵↵↵
    0.41
    ]
    0.40
     četiri
    0.40
    ↵↵↵
    0.38
    ресень
    0.38
     slumped
    0.38
    ;
    0.38
    POSITIVE LOGITS
    ل
    0.72
    h
    0.70
    u
    0.69
    l
    0.64
    و
    0.63
    ли
    0.59
    can
    0.58
    al
    0.56
    il
    0.55
    it
    0.54
    Act Density 0.003%

    No Known Activations