INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.47
    na
    1.46
    la
    1.45
    lara
    1.42
    ل
    1.40
    rary
    1.36
    на
    1.28
    lja
    1.27
    om
    1.26
    sk
    1.24
    POSITIVE LOGITS
    6
    1.65
    7
    1.41
    9
    1.37
    8
    1.30
    0
    1.29
    5
    1.25
    4
    1.23
    3
    1.17
    2
    1.09
     steal
    1.05
    Act Density 0.001%

    No Known Activations