INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ד
    1.29
    ра
    1.26
    де
    1.24
    чення
    1.20
    of
    1.15
    ى
    1.12
    ме
    1.11
    ння
    1.09
    ۰
    1.05
    𝘴
    1.05
    POSITIVE LOGITS
    '
    1.48
     it
    1.43
     n
    1.41
    ت
    1.29
    :
    1.22
    in
    1.18
     I
    1.15
     this
    1.14
     climb
    1.13
    ب
    1.12
    Act Density 0.003%

    No Known Activations