INDEX
    Explanations

    linear combination/transformations

    New Auto-Interp
    Negative Logits
    t
    2.28
    l
    1.25
    m
    1.10
    en
    1.00
    1.00
    0.98
    j
    0.98
    től
    0.94
    de
    0.92
    d
    0.91
    POSITIVE LOGITS
    1.02
     be
    0.85
    TI
    0.84
    не
    0.82
     a
    0.81
    р
    0.80
    0.79
    0.75
    0.75
    0.75
    Act Density 0.024%

    No Known Activations