INDEX
    Explanations

    semicolon terminating list items

    New Auto-Interp
    Negative Logits
    \
    0.97
    يل
    0.94
    (
    0.94
     that
    0.77
     surm
    0.75
    л
    0.75
     tomto
    0.73
     zuf
    0.72
    ل
    0.72
    v
    0.71
    POSITIVE LOGITS
    1.14
    1.07
    to
    1.05
    however
    0.95
    ки
    0.94
    0.93
    もら
    0.90
    きた
    0.89
    0.88
    ע
    0.85
    Act Density 0.024%

    No Known Activations