INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    mathbf
    -0.85
    AB
    -0.84
    downarrow
    -0.79
     функции
    -0.78
    -0.77
    Wow
    -0.77
    ACH
    -0.77
    atellite
    -0.77
    Iter
    -0.76
    lei
    -0.76
    POSITIVE LOGITS
     levels
    5.00
     level
    4.22
    levels
    3.63
     Levels
    3.45
    Levels
    3.42
     LEVELS
    2.97
    LEVEL
    2.94
     niveles
    2.94
    level
    2.91
     уровня
    2.91
    Act Density 0.078%

    No Known Activations