INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     are
    1.34
     in
    1.22
     e
    1.06
    \
    1.02
     disappoint
    0.96
     to
    0.95
     as
    0.94
     so
    0.93
     câștig
    0.89
     be
    0.87
    POSITIVE LOGITS
    n
    2.08
    m
    2.03
    b
    1.73
    is
    1.73
    p
    1.56
    ac
    1.42
    r
    1.42
    y
    1.37
    1.30
    f
    1.29
    Act Density 0.067%

    No Known Activations