INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ↵↵
    1.50
    1.08
     
    1.05
    9
    0.91
     раздел
    0.87
    ти
    0.86
    0.86
     важ
    0.86
    ↵↵↵↵
    0.85
    0.85
    POSITIVE LOGITS
    in
    1.34
    í
    1.32
    রা
    1.27
    the
    1.23
    ia
    1.23
    ul
    1.19
    g
    1.19
    m
    1.17
    s
    1.14
    ina
    1.13
    Act Density 0.001%

    No Known Activations