INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    в
    0.46
    но
    0.45
    2
    0.41
    от
    0.37
    вят
    0.35
     precedenti
    0.35
     nhàng
    0.34
     инструкции
    0.33
    ган
    0.33
     namani
    0.33
    POSITIVE LOGITS
    \
    0.43
    N
    0.43
    J
    0.38
    L
    0.38
    K
    0.38
    נ
    0.38
    S
    0.37
    U
    0.37
    T
    0.36
    B
    0.35
    Act Density 0.394%

    No Known Activations