INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    1.96
    the
    1.57
    c
    1.39
    ف
    1.34
    ]
    1.27
    )
    1.26
    1.24
     at
    1.21
     the
    1.09
    1
    1.05
    POSITIVE LOGITS
     I
    1.28
     В
    1.15
     У
    1.14
     А
    1.13
     Д
    1.12
     П
    1.09
     О
    1.06
     Ц
    1.04
     реки
    1.03
     Α
    1.02
    Act Density 0.004%

    No Known Activations