INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     функция
    0.67
     λειτουργ
    0.64
    𝑦
    0.63
    юнча
    0.62
    ту
    0.61
    ни
    0.60
     функцию
    0.58
    postos
    0.58
    ции
    0.56
     функ
    0.55
    POSITIVE LOGITS
     (
    0.71
    AN
    0.68
    os
    0.62
    ara
    0.62
    .*;
    0.62
    ant
    0.61
    ia
    0.61
     are
    0.57
     from
    0.56
    um
    0.55
    Act Density 0.009%

    No Known Activations