INDEX
    Explanations

    ratios and comparisons in the context of training or performance

    New Auto-Interp
    Negative Logits
    еÑĢи
    -0.16
    еÑĢÑĸ
    -0.15
     Kraj
    -0.15
    interop
    -0.14
    overe
    -0.14
    rys
    -0.14
    ©
    -0.14
    medi
    -0.14
    éry
    -0.14
    lys
    -0.14
    POSITIVE LOGITS
    ratio
    0.18
     ratio
    0.17
    ÃŃd
    0.16
    ixe
    0.15
    contro
    0.15
    ixin
    0.15
    conto
    0.14
    ixa
    0.14
    entrant
    0.14
    upp
    0.14
    Act Density 0.081%

    No Known Activations