INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.43
    ного
    0.37
    ла
    0.37
    ת
    0.36
     CharPtr
    0.36
    0.35
    ان
    0.34
    Daten
    0.34
    ز
    0.34
    д
    0.34
    POSITIVE LOGITS
     be
    0.59
     the
    0.48
     it
    0.47
    l
    0.47
     a
    0.46
    t
    0.42
    \
    0.40
    (
    0.40
     in
    0.35
    *
    0.34
    Act Density 0.048%

    No Known Activations